Basic analytic framework for processing 16S rRNA gene amplicon data. Each R chunk represents a specific analysis. Depending on the makeup of the data, such as sample names, variable names or data subsets some R chunks are optional or will need modification as needed.
The code chunks perform the following functions:
To be added: Proscutes, ANCOVA
The example data are dada2 resolved sequence variants. Briefly, extracted nucleic acid from stool samples collected from individually caged mice were amplified in triplicate using primers specific for the V4 region using primers 515F/806R. One group of mice (n=30) were treated with ampicillin for 3 days, and the other a vehicle control (n=15).
We will begin by customizing our global settings, activating packages and loading our data into R using the following steps:
Useful for standardizing how R chunks are handled by knitr. There are quite a few options you can use in this section which can be read about here, however, setting a standard figure height and width as well as an output directory and prefix name for all knitted figures can be useful.
library("tidyverse")
packageVersion("tidyverse")
## [1] '1.1.1'
library("phyloseq")
packageVersion("phyloseq")
## [1] '1.20.0'
library("RColorBrewer")
packageVersion("RColorBrewer")
## [1] '1.1.2'
library("vegan")
packageVersion("vegan")
## [1] '2.4.3'
library("gridExtra")
packageVersion("gridExtra")
## [1] '2.2.1'
library("knitr")
packageVersion("knitr")
## [1] '1.17'
library("DESeq2")
packageVersion("DeSeq2")
## [1] '1.16.1'
library("plotly")
packageVersion("plotly")
## [1] '4.7.1'
library("microbiome")
packageVersion("microbiome")
## [1] '0.99.42'
library("ggpubr")
packageVersion("ggpubr")
## [1] '0.1.5'
library("randomForest")
packageVersion("randomForest")
## [1] '4.6.12'
library("data.table")
packageVersion("data.table")
## [1] '1.10.4'
library("corrplot")
packageVersion("corrplot")
## [1] '0.77'
The output from a standard dada2 workflow should be an RDS file. In this case the file is called ps0.rds. You may have already merged your mapping file data (sample variables) with the rds file. However, you will likely add or modify this mapping file as you progress, so it is useful to initiate an import/merge of a modifiable mapping file.
# Read in an RDS file containing taxonomic and count information
ps0 <- readRDS("ps0.rds")
# Read in a mapping file containing sample variable information
map <- import_qiime_sample_data("mapping.txt")
# Merge the RDS file with the mapping file
ps0 <- merge_phyloseq(ps0, map)
# Perform a few sanity checks
sample_variables(ps0)
## [1] "X.SampleID" "NumberReads" "BarcodeSequence"
## [4] "LinkerPrimerSequence" "Plate" "Well"
## [7] "Name" "SampleNo" "DNAPlateNumber"
## [10] "Cell" "Experiment" "Microbiota"
## [13] "Sex" "GroupedCage" "IndividualCage"
## [16] "DayPostTreatment" "Treatment" "Virus"
## [19] "DayPostInoculation" "SurvivalStatus" "DaysToDeath"
## [22] "SampleCollected" "SampesProcessed" "Description"
## [25] "continuous_var_1" "continuous_var_2"
ntaxa(ps0)
## [1] 443
rank_names(ps0)
## [1] "Kingdom" "Phylum" "Class" "Order" "Family" "Genus" "Species"
get_taxa_unique(ps0, "Phylum")
## [1] "p__Bacteroidetes" "p__Verrucomicrobia" "p__Firmicutes"
## [4] "p__Tenericutes" "p__Cyanobacteria" "p__Proteobacteria"
## [7] "p__Actinobacteria" NA "p__Deferribacteres"
The default sorting for ggplot2 is alphabetical. So if you want to make a box plot comparing Shannon diversity between wild-type and knockout mice, it will by default always place knockout on the left and wild-type on the right. However, you may wish to switch this so the knock-out is on the right and wild-type on the left.
This can be done on a plot-by-plot basis, however, it is likely that you will want all of your plots to reflect this customization throughout the entire analysis, so it is useful to have an R chunk at the very beginning of your workflow to specify order and label names.
In the example data, most of the analysis will be done comparing the sample variable “treatment” which is either KoolAid or Ampicillin in the mapping file. Due to default ordering, Ampicillin will always appear before Koolaid. We want the control displayed first (on the left of most plots). We also want to use the more formal “Vehicle” to indicate that a “vehicle control” was used. Koolaid is added to the water to encourage mice to drink the antibiotic laden water. This would be indicated in the methods of a manuscript, but the plots should be more formal and indicate that this was a vehicle control. The code chunk below provides examples for reordering and relabeling sample variable data.
# Examine the way the sample data look now
levels(sample_data(ps0)$Treatment)
## [1] "Ampicillin" "Koolaid"
# Reorder it so vehicle appears first
sample_data(ps0)$Treatment <- factor(sample_data(ps0)$Treatment, levels = c("Koolaid", "Ampicillin"))
levels(sample_data(ps0)$Treatment)
## [1] "Koolaid" "Ampicillin"
# Change Koolaid into Vehicle
sample_data(ps0)$Treatment <- factor(sample_data(ps0)$Treatment, labels = c("Vehicle", "Ampicillin"))
# Check that it worked as you expected
sample_data(ps0)$Treatment
## [1] Vehicle Vehicle Vehicle Vehicle Vehicle Vehicle
## [7] Vehicle Vehicle Vehicle Vehicle Vehicle Vehicle
## [13] Vehicle Vehicle Vehicle Ampicillin Ampicillin Ampicillin
## [19] Ampicillin Ampicillin Ampicillin Ampicillin Ampicillin Ampicillin
## [25] Ampicillin Ampicillin Ampicillin Ampicillin Ampicillin Ampicillin
## [31] Ampicillin Ampicillin Ampicillin Ampicillin Ampicillin Ampicillin
## [37] Ampicillin Ampicillin Ampicillin Ampicillin Ampicillin Ampicillin
## [43] Ampicillin Ampicillin Ampicillin
## Levels: Vehicle Ampicillin
Data assessment consists of 2 steps:
Begin by running the following R chunk to produce several summary plots and basic statistics about the RSV’s and samples in your data.
# Create a new data frame of the sorted row sums, a column of sorted values from 1 to the total number of individuals/counts for each RSV and a categorical variable stating these are all RSVs.
readsumsdf = data.frame(nreads = sort(taxa_sums(ps0), TRUE),
sorted = 1:ntaxa(ps0),
type = "RSVs")
# Add a column of sample sums (total number of individuals per sample)
readsumsdf = rbind(readsumsdf,
data.frame(nreads = sort(sample_sums(ps0), TRUE),
sorted = 1:nsamples(ps0),
type = "Samples"))
# Make a data frame with a column for the read counts of each sample for histogram production
sample_sum_df <- data.frame(sum = sample_sums(ps0))
# Make plots
# Generates a bar plot with # of reads (y-axis) for each taxa. Sorted from most to least abundant
# Generates a second bar plot with # of reads (y-axis) per sample. Sorted from most to least
p.reads = ggplot(readsumsdf, aes(x = sorted, y = nreads)) +
geom_bar(stat = "identity") +
ggtitle("RSV Assessment") +
scale_y_log10() +
facet_wrap(~type, scales = "free") +
ylab("# of Sequences")
# Histogram of the number of Samples (y-axis) at various read depths
p.reads.hist <- ggplot(sample_sum_df, aes(x = sum)) +
geom_histogram(color = "black", fill = "firebrick3", binwidth = 150) +
ggtitle("Distribution of sample sequencing depth") +
xlab("Read counts") +
ylab("# of Samples")
# Final plot, side-by-side
grid.arrange(p.reads, p.reads.hist, ncol = 2)
# Basic summary statistics
summary(sample_sums(ps0))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 368 806 1593 14086 33143 49325
There is an R script in the /scripts directory called plot_violin which encapsulates that entire R chunk into a single command.
The above data assessment is useful for getting an idea of 1) the overall taxonomic distribution of your reads (left plot). This will normally be a “long tail” with some taxa being highly abundant in the data tapering off to taxa with very few reads, 2) probably more valuable than the first plot is how many reads are in each sample (middle plot). Very low read count can be indicative of a failed reaction and 3) a histogram of the number of samples at various “bins” of read depth. Each of these plots will help give an understanding of how your data are structured across taxa and samples and will vary depending on the nature of your samples.
Samples with unexpectedly low number of sequences can be safely removed. This is an intuitive process and should be instructed by your understanding of the samples in your study. For example, if you have 5 samples from stool samples, one would expect to obtain thousands, if not several thousands of RSVs. This may not be the case for other tissues, such as spinal fluid or tissue samples. Similarly, you would not expect thousands of RSV from samples obtained from antibiotic treated organisms. Following antibiotic treatment you may be left with dozens or hundreds of RSVs. So contextual awareness about the biology of your system should guide your decision to remove samples based on RSV number. The basic idea is to remove samples with “unexpected” numbers of RSV.
Importantly, at each stage you should document and justify your decisions. If you are concerned that sample removal will alter the interpretation of your results, you should run your analysis on the full data and the data with the sample(s) removed to see how the decision affects your interpretation.
The above plots provide overall summaries about the number of RSVs found in all of your samples. However, they are not very useful for identifying and removing specific samples. This can be done using the following R chunk.
# Format a data table to combine sample summary data with sample variable data
ss <- sample_sums(ps0)
sd <- as.data.frame(sample_data(ps0)) # useful to coerce the phyloseq object into an R data frame
df <- merge(sd, data.frame("RSVs" = ss), by ="row.names") # merge ss with sd by row names. Rename ss to RSVs in the new data frame
# Plot the data by the treatment variable
y = 50 # Set a threshold for the minimum number of acceptable reads. Can start as a guess
x = "Treatment" # Set the x-axis variable you want to examine
label = "Well" # This is the label you want to overlay on the points that are below threshold y. Should be something sample specific
p.ss.boxplot <- ggplot(df, aes_string(x, y = "RSVs", fill = x)) + # x is what you assigned it above
geom_boxplot(outlier.colour="NA") +
scale_y_log10() +
geom_hline(yintercept = y, lty = 2) + # Draws a dashed line across the threshold you set above as y
geom_jitter(alpha = 0.5, width = 0.15, size = 2)
# geom_text(aes_string(x, y="RSVs", label=label), size=3, nudge_x = 0.1) # This labels a subset that fall below threshold variable y and labels them with the label variable
p.ss.boxplot
(In progress) There is an R script in the /scripts directory called plot_violin which encapsulates that entire R chunk into a single command.
The example data does have a couple of samples with fewer than 500 RSVs. However, these come from samples obtained from antibiotic treated mice, so this fits our expectation. There is also a single sample with fewer than 1,000 RSVs in the control data which is extraordinarliy low compared to the other samples. This sample should definetely be considered for removal. When questionable samples arise you should take note of them so if there are samples which behave oddly in downstream analysis you can recall this information and perhaps justify their removal. The following R chunk shows how to identify and remove samples if needed.
# Use the same 'subset' strategy to plot text on only those samples with fewer than y numbers of RSVs to produce a table of samples
low.RSV.samples <- subset(df, RSVs <= 1000) # Set the number to 1,000 to capture that one sample in vehicle control that is lower than 1,000
low.RSV.samples
## Row.names X.SampleID NumberReads BarcodeSequence LinkerPrimerSequence
## 15 806rcbc347 806rcbc347 1296 CTAGCTATGGAC NA
## 17 806rcbc349 806rcbc349 551 ACCTGGGAATAT NA
## 18 806rcbc350 806rcbc350 1224 CTCTGCCTAATT NA
## 19 806rcbc351 806rcbc351 775 ATATGACCCAGC NA
## 20 806rcbc352 806rcbc352 760 CTCTATTCCACC NA
## 27 806rcbc359 806rcbc359 1042 AAGTGGCTATCC NA
## 29 806rcbc361 806rcbc361 835 TCGATTGGCCGT NA
## 30 806rcbc362 806rcbc362 943 GCATTACTGGAC NA
## 31 806rcbc363 806rcbc363 1059 TTGGGCCACATA NA
## 32 806rcbc364 806rcbc364 1149 CACACAAAGTCA NA
## 37 806rcbc369 806rcbc369 1269 GTTCCTCCATTA NA
## 38 806rcbc370 806rcbc370 1370 ACCTGTCCTTTC NA
## 39 806rcbc371 806rcbc371 783 GTTCACGCCCAA NA
## 41 806rcbc373 806rcbc373 971 CATGCCAACATG NA
## 43 806rcbc375 806rcbc375 598 CCTACATGAGAC NA
## 44 806rcbc376 806rcbc376 1136 TCCGTGGTATAG NA
## 45 806rcbc377 806rcbc377 1324 TCTACGGCACGT NA
## Plate Well Name SampleNo DNAPlateNumber Cell Experiment
## 15 4 E12 806rcbc347 60 1 E12 LBT2017_1
## 17 4 F2 806rcbc349 62 1 F2 LBT2017_1
## 18 4 F3 806rcbc350 63 1 F3 LBT2017_1
## 19 4 F4 806rcbc351 64 1 F4 LBT2017_1
## 20 4 F5 806rcbc352 65 1 F5 LBT2017_1
## 27 4 F12 806rcbc359 72 1 F12 LBT2017_1
## 29 4 G2 806rcbc361 74 1 G2 LBT2017_1
## 30 4 G3 806rcbc362 75 1 G3 LBT2017_1
## 31 4 G4 806rcbc363 76 1 G4 LBT2017_1
## 32 4 G5 806rcbc364 77 1 G5 LBT2017_1
## 37 4 G10 806rcbc369 82 1 G10 LBT2017_1
## 38 4 G11 806rcbc370 83 1 G11 LBT2017_1
## 39 4 G12 806rcbc371 84 1 G12 LBT2017_1
## 41 4 H2 806rcbc373 86 1 H2 LBT2017_1
## 43 4 H4 806rcbc375 88 1 H4 LBT2017_1
## 44 4 H5 806rcbc376 89 1 H5 LBT2017_1
## 45 4 H6 806rcbc377 90 1 H6 LBT2017_1
## Microbiota Sex GroupedCage IndividualCage DayPostTreatment
## 15 JAX C57BL6/J FALSE L L5 3
## 17 JAX C57BL6/J FALSE M M2 3
## 18 JAX C57BL6/J FALSE M M3 3
## 19 JAX C57BL6/J FALSE M M4 3
## 20 JAX C57BL6/J FALSE M M5 3
## 27 JAX C57BL6/J FALSE O O2 3
## 29 JAX C57BL6/J FALSE O O4 3
## 30 JAX C57BL6/J FALSE O O5 3
## 31 JAX C57BL6/J FALSE P P1 3
## 32 JAX C57BL6/J FALSE P P2 3
## 37 JAX C57BL6/J FALSE Q Q2 3
## 38 JAX C57BL6/J FALSE Q Q3 3
## 39 JAX C57BL6/J FALSE Q Q4 3
## 41 JAX C57BL6/J FALSE R R1 3
## 43 JAX C57BL6/J FALSE R R3 3
## 44 JAX C57BL6/J FALSE R R4 3
## 45 JAX C57BL6/J FALSE R R5 3
## Treatment Virus DayPostInoculation SurvivalStatus DaysToDeath
## 15 Vehicle na d_11 Dead 11
## 17 Ampicillin na d_11 Alive na
## 18 Ampicillin na d_11 Dead 11
## 19 Ampicillin na d_11 Dead 7
## 20 Ampicillin na d_11 Alive na
## 27 Ampicillin na d_11 Alive na
## 29 Ampicillin na d_11 Alive na
## 30 Ampicillin na d_11 Dead 11
## 31 Ampicillin na d_11 Dead 10
## 32 Ampicillin na d_11 Alive na
## 37 Ampicillin na d_11 Alive na
## 38 Ampicillin na d_11 Dead 10
## 39 Ampicillin na d_11 Dead 18
## 41 Ampicillin na d_11 Dead 11
## 43 Ampicillin na d_11 Alive na
## 44 Ampicillin na d_11 Alive na
## 45 Ampicillin na d_11 Alive na
## SampleCollected SampesProcessed Description continuous_var_1
## 15 LT QT 806rcbc347 102.8443
## 17 LT QT 806rcbc349 1293.2978
## 18 LT QT 806rcbc350 534.2119
## 19 LT QT 806rcbc351 3148.8693
## 20 LT QT 806rcbc352 3433.2437
## 27 LT QT 806rcbc359 1799.1402
## 29 LT QT 806rcbc361 1428.9089
## 30 LT QT 806rcbc362 5582.6534
## 31 LT QT 806rcbc363 2862.9539
## 32 LT QT 806rcbc364 438.9060
## 37 LT QT 806rcbc369 3270.4070
## 38 LT QT 806rcbc370 4251.9886
## 39 LT QT 806rcbc371 1511.8590
## 41 LT QT 806rcbc373 2464.2605
## 43 LT QT 806rcbc375 513.6356
## 44 LT QT 806rcbc376 1178.5029
## 45 LT QT 806rcbc377 8920.0735
## continuous_var_2 RSVs
## 15 2.730802 921
## 17 3.701808 368
## 18 1.938974 786
## 19 2.695727 553
## 20 2.212551 512
## 27 2.444594 722
## 29 2.113268 598
## 30 2.420489 673
## 31 2.695764 747
## 32 2.161772 806
## 37 2.145264 898
## 38 2.244468 947
## 39 2.872685 557
## 41 3.137555 682
## 43 1.993803 413
## 44 2.226764 809
## 45 2.789934 855
# You can use this data to find a unique identifier to remove those specific samples
ps0 # View the original number of samples
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 443 taxa and 45 samples ]
## sample_data() Sample Data: [ 45 samples by 26 sample variables ]
## tax_table() Taxonomy Table: [ 443 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 443 tips and 441 internal nodes ]
ps1 <- ps0 %>%
subset_samples(
Name != "806rcbc347"
)
ps1 # The sample number should be reduced according to your expectations
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 443 taxa and 44 samples ]
## sample_data() Sample Data: [ 44 samples by 26 sample variables ]
## tax_table() Taxonomy Table: [ 443 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 443 tips and 441 internal nodes ]
Note that we created a new phyloseq object called ps1. This preserves all of the data in the original ps0 and creates a new data object with the offending sample(s) removed called ps1.
Failure to detect and remove “bad” samples can make interpreting ordinations much more challenging as they typically project as “outliers” severely skewing the rest of the samples. These samples also increase variance and will impede your ability to identify differentially abundant taxa between groups. So sample outlier removal should be a serious and thoughtful part of every analysis in order to obtain optimal results.
The next code chunk implements an MDS plot of Bray-Curtis dissimilarity. This is a simple projection of multivariant data and can be useful for identifying sample outliers similar to what we just did above. However, this takes into consideration the entire properties of the data set, and not just number of RSVs. If outliers are suspected based on this plot one should consider their removal.
# Outlier evaluation
out.bray <- ordinate(ps1, method = "MDS", distance = "bray")
p.MDS.outlier <- plot_ordination(ps1, out.bray, color = "Treatment", axes = c(1,2)) +
theme_bw() +
geom_point(size = 2) +
ggtitle("MDS of Bray Distances \nOutlier Evaluation") +
geom_text(aes(label = Well), size = 3, check_overlap = FALSE, vjust = -1)
p.MDS.outlier
## Outlier sample removal
You can see when we color code the points by treatment that there are three samples that are intermediate between Vehicle and Ampicillin treatment (labelled G1, H1, and H3), and one Ampicillin sample behaving like a Vehicle treated sample (sample F1). The samples were intentionally labelled by the sample well from the 96-well plate that was used in library creation. It is very common for the edge wells (rows A or H and columns 1 and 12) to have a higher failure rate due to evaporation during thermal cycling. Overlaying the well coordinates as text on the ordination plot shows that the samples exhibiting unexpected behavior were amplified in edge wells, except the H3 sample. Knowing the well position along with the unexpected behavior allows us to make the safe decision that these samples are likely unrepresentative of the experiment and can be justifiably documented and removed.
If you recall the violin plot generated to detect sample outliers above, there were also four samples from the Ampicillin treated samples with unexpectedly high numbers of RSVs. We did not consider those earlier, but this MDS plot suggests that there are samples from Ampicillin treated mice behaving like samples from Vehicle control mice. If you redraw the violin plot but overlay “Well” as the text variable for all samples and not a subset, you will see that samples G1, H1, H3 and in particular the sample in F1 have RSV numbers similar to the Vehicle control mice.
The fact that three of these came from edge wells is a strong argument for their removal. The sample in H3 is a more challenging decision. These samples could just be representative of the natural biological variability in the effectiveness of Ampicillin treatment in mice. Perhaps the antibiotics just didnt work as well in the mouse used in well H3.
Sample removal decisions should be made thoughtufully and considering biological and technical context. Again, the appropirate way to handle this is to document and test the affect of the removal. For this example, I am choosing to remove all of the outlier samples as indicated by the MDS and RSV violin plots (samples in well G1, H1, F1 as well as the curious sample in H3). I am doing so because I want to also exclude variance due to failed Ampicillin treatment which these plots are in support of.
# Note: DO NOT remove based on well number. Each run uses several 96-well plates, so removal by well number will potential remove serveral samples
# Always use a unique ID for sample removal
ps1 # View the original number of samples
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 443 taxa and 44 samples ]
## sample_data() Sample Data: [ 44 samples by 26 sample variables ]
## tax_table() Taxonomy Table: [ 443 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 443 tips and 441 internal nodes ]
ps2 <- ps1 %>%
subset_samples(
Name != "806rcbc348" &
Name != "806rcbc360" &
Name != "806rcbc374" &
Name != "806rcbc372"
)
ps2 # The sample number should be reduced according to your expectations
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 443 taxa and 40 samples ]
## sample_data() Sample Data: [ 40 samples by 26 sample variables ]
## tax_table() Taxonomy Table: [ 443 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 443 tips and 441 internal nodes ]
# You can redraw the plot to confirm removal
out.bray.removed <- ordinate(ps2, method = "MDS", distance = "bray")
p.MDS.outlier.removed <- plot_ordination(ps2, out.bray.removed, color = "Treatment") +
theme_bw() +
geom_point(size = 2) +
ggtitle("MDS of Bray Distances \nOutliers Removed")
p.MDS.outlier.removed
##Taxon prevalence estimations and filtering
Low abundant taxa typically do not contribute to ecological community evaluation or differential abundnace testing. There are of course caveats to this statement (i.e. low-abundance pathogen detction), but many analysis can benefit from the removal of uninformative (low prevelance) taxa. Removal of low prevelance taxa greatly assist in tests penalized with a false-discovery-rate (FDR) calculation. Similar to outlier sample removal, low prevelant taxa removal should be justified and documented. The following R chunk provides several evaluations and plots to assist with this decision.
# Begin by removing sequences that were not classified as Bacteria or were classified as either mitochondria or chlorplast
ps2 # Check the number of taxa prior to removal
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 443 taxa and 40 samples ]
## sample_data() Sample Data: [ 40 samples by 26 sample variables ]
## tax_table() Taxonomy Table: [ 443 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 443 tips and 441 internal nodes ]
ps2 <- ps2 %>%
subset_taxa(
Kingdom == "k__Bacteria" &
Family != "f__mitochondria" &
Class != "c__Chloroplast"
)
ps2 # Confirm that the taxa were removed
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 415 taxa and 40 samples ]
## sample_data() Sample Data: [ 40 samples by 26 sample variables ]
## tax_table() Taxonomy Table: [ 415 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 415 tips and 413 internal nodes ]
# Note the prefix (e.g. k__, f__, c__, etc.) is a GreenGenes convention. If you are using RDP or Silva you can simply remove these prefixes.
# To remove RSVs lacking phyla classification
#ps2 <- subset_taxa(ps2, !is.na(Phylum) & !Phylum %in% c("","uncharacterized"))
# Prevelance estimation
# Calculate feature prevelance across the data set
prevdf <- apply(X = otu_table(ps2),MARGIN = ifelse(taxa_are_rows(ps2), yes = 1, no = 2),FUN = function(x){sum(x > 0)})
# Add taxonomy and total read counts to prevdf
prevdf <- data.frame(Prevalence = prevdf, TotalAbundance = taxa_sums(ps2), tax_table(ps2))
# Create a table of Phylum, their mean abundances across all samples, and the number of samples they were detected in
plyr::ddply(prevdf, "Phylum", function(df1){cbind(mean(df1$Prevalence),sum(df1$Prevalence))})
## Phylum 1 2
## 1 p__Actinobacteria 15.10000 151
## 2 p__Bacteroidetes 16.23404 763
## 3 p__Cyanobacteria 3.50000 7
## 4 p__Deferribacteres 8.00000 8
## 5 p__Firmicutes 14.43519 4677
## 6 p__Proteobacteria 14.84615 193
## 7 p__Tenericutes 9.87500 158
## 8 p__Verrucomicrobia 23.00000 46
#Prevalence plot
prevdf1 <- subset(prevdf, Phylum %in% get_taxa_unique(ps0, "Phylum"))
p.prevdf1 <- ggplot(prevdf1, aes(TotalAbundance, Prevalence / nsamples(ps2),color=Family)) +
geom_hline(yintercept = 0.05, alpha = 0.5, linetype = 2) +
geom_point(size = 3, alpha = 0.7) +
scale_x_log10() +
xlab("Total Abundance") + ylab("Prevalence [Frac. Samples]") +
facet_wrap(~Phylum) +
theme(legend.position="none") +
ggtitle("Phylum Prevelence in All Samples\nColored by Family")
p.prevdf1
The code below can be reproduced using the script calleed plot_prevalance.R in the scripts folder.
This code will produce a table and a plot of all of the Phyla present in your samples along with information about their prevelance (fraction of samples they are present in) and total abundance across all samples. For this data, you can see from the table that the Cyanobacteria and Deferribacteres have both low prevelance and overall abundance. This is reflected in the plot which displays all of the unique taxa within each family. The points are colored by family, so you can see that there is only one type of Cyanobacteria and Deferribacteres, but lots of Firmicutes. There is also a ubiqutous Tenericute in the data along with another more randomly distributed Tenericute.
The plot also shows a dashed line that was subjectively chosen to cross at 5% prevelance. Everything below this line is present in fewer than 5% of all of the samples in the study.
Both the table and the plots can be used to remove low prevelant taxa. This can be done by either choosing to remove specific taxa (e.g. Deferribacteres or Cyanobacteria from the example data), or remove all taxa above or below a given threshold (e.g. below 5% prevelance). If you choose to do this you should justify and document. You should also consider analyzing both data (data with all taxa vs. data with low-prevelant taxa removed) to observe the effects of taxa removal. This will be useful for understanding how low-prevelant taxa removal influences your results. Low-prevalence filtering is a common practice precedding differential abudnace testing. Low prevelant taxa create harsh penalties on statistical tests due to multiple testing penalties and it is common practice to remove them. The R chunk below describes how to perform taxon filtering.
For the sake of example, we will go ahead and remove the low-prevalence Deferribacteres and Cyanobacteria. There is no biological basis behind this, and it would actually be detrimental if you were interested in the biology of either type of organism. But statistically they are not going to provide much information about community diversity across samples. On the other hand, they probably will not cause any problems either, so would normally be safe to leave them in. We will remove them just as an example. We will also make a new phyloseq object with taxa < 5% prevalence removed called ps3.prev.
# Remove specific taxa
# Define a variable with taxa to remove
filterPhyla = c("p__Deferribacteres", "p__Cyanobacteria")
ps2 # Check the number of taxa prior to removal
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 415 taxa and 40 samples ]
## sample_data() Sample Data: [ 40 samples by 26 sample variables ]
## tax_table() Taxonomy Table: [ 415 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 415 tips and 413 internal nodes ]
ps3 <- subset_taxa(ps2, !Phylum %in% filterPhyla)
ps3 # Confirm the taxa were removed
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 412 taxa and 40 samples ]
## sample_data() Sample Data: [ 40 samples by 26 sample variables ]
## tax_table() Taxonomy Table: [ 412 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 412 tips and 410 internal nodes ]
# Removing taxa that fall below 5% prevelance
# Define the prevalence threshold
prevalenceThreshold = 0.05 * nsamples(ps3)
prevalenceThreshold
## [1] 2
# Define which taxa fall within the prevalence threshold
keepTaxa <- rownames(prevdf1)[(prevdf1$Prevalence >= prevalenceThreshold)]
ps3 # Check the number of taxa prior to removal
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 412 taxa and 40 samples ]
## sample_data() Sample Data: [ 40 samples by 26 sample variables ]
## tax_table() Taxonomy Table: [ 412 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 412 tips and 410 internal nodes ]
ps3.prev <- prune_taxa(keepTaxa, ps3)
ps3.prev # Confirm the taxa were removed
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 402 taxa and 40 samples ]
## sample_data() Sample Data: [ 40 samples by 26 sample variables ]
## tax_table() Taxonomy Table: [ 402 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 402 tips and 400 internal nodes ]
Many analysis in community ecology and hypothesis testing benefit from data transformation. Many microbiome data sets do not fit to a normal distribution, but transforming them towards normality may enable more appropriate data for specific statistical tests. The choice of transformation is not straight forward. There is literature on how frequently used transfromations affect certain analysis, but every data set may require different considerations. Therefore, it is recommended that you examine the effects of several transformations on your data and explore how they alter your results and interpretation.
The R chunk below implements several commonly used transformations in microbiome research and plots their results. Similar to outlier removal and prevalance filtering, your choice should be justified, tested and documented.
# Transform to Realative abundances
ps3.ra <- transform_sample_counts(ps3, function(OTU) OTU/sum(OTU))
# Transform to Proportional Abundance
ps3.prop <- transform_sample_counts(ps3, function(x) min(sample_sums(ps3)) * x/sum(x))
# Log transformation moves to a more normal distribution
ps3.log <- transform_sample_counts(ps3, function(x) log(1 + x))
# View how each function altered count data
par(mfrow=c(1,4))
plot(sort(sample_sums(ps3), TRUE), type = "o", main = "Native", ylab = "RSVs", xlab = "Samples")
plot(sort(sample_sums(ps3.log), TRUE), type = "o", main = "log Transfromed", ylab = "RSVs", xlab = "Samples")
plot(sort(sample_sums(ps3.ra), TRUE), type = "o", main = "Relative Abundance", ylab = "RSVs", xlab = "Samples")
plot(sort(sample_sums(ps3.prop), TRUE), type = "o", main = "Proportional Abundance", ylab = "RSVs", xlab = "Samples")
par(mfrow=c(1,1))
# Histograms of the non-transformed data vs. the transformed data can address the shift to normality
p.nolog <- qplot(rowSums(otu_table(ps3))) + ggtitle("Raw Counts") +
theme_bw() +
xlab("Row Sum") +
ylab("# of Samples")
p.log <- qplot(log10(rowSums(otu_table(ps3)))) +
ggtitle("log10 transformed counts") +
theme_bw() +
xlab("Row Sum") +
ylab("# of Samples")
grid.arrange(p.nolog, p.log, ncol = 2)
##Subsetting (optional)
You will frequently find that you want to analyze a subset of your total data set. There are typically commands that will allow you to do this for each individual analysis, but similar to variable reordering it can sometime be more convenient to do this towards the beginning of your analysis. This should be done after removal of outlier samples and taxa. If you wish to create transformed versions of each subset you can either subset the transformed data you just generated, or alternatively retransform your subsetted data. The R chunk below is an example subsetting of the example data by treatment.
Subsetting away samples can create a situation where taxa are present as empty rows. This is because not every sample has every taxa. These can be removed as shown in the R chunk below.
Creating individual subsets like this can be particularly useful when assessing differential abundance using DeSeq2.
# Make a subset of mice treated with Ampicillin
ps3 # Check the original number of samples
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 412 taxa and 40 samples ]
## sample_data() Sample Data: [ 40 samples by 26 sample variables ]
## tax_table() Taxonomy Table: [ 412 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 412 tips and 410 internal nodes ]
ps3.amp <- subset_samples(ps3, Treatment == "Ampicillin")
ps3.amp # Check that the number of samples is correct following the subsetting
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 412 taxa and 26 samples ]
## sample_data() Sample Data: [ 26 samples by 26 sample variables ]
## tax_table() Taxonomy Table: [ 412 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 412 tips and 410 internal nodes ]
any(taxa_sums(ps3.amp) == 0) # In this case it is TRUE, so remove the zero's
## [1] TRUE
ps3.amp <- prune_taxa(taxa_sums(ps3.amp) > 0, ps3.amp)
ps3.amp # Number of taxa should be smaller now
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 359 taxa and 26 samples ]
## sample_data() Sample Data: [ 26 samples by 26 sample variables ]
## tax_table() Taxonomy Table: [ 359 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 359 tips and 357 internal nodes ]
any(taxa_sums(ps3.amp) == 0) # It should now be false
## [1] FALSE
# Make a subset of mice treated with Vehicle
ps3 # Check the original number of samples
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 412 taxa and 40 samples ]
## sample_data() Sample Data: [ 40 samples by 26 sample variables ]
## tax_table() Taxonomy Table: [ 412 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 412 tips and 410 internal nodes ]
ps3.vehicle <- subset_samples(ps3, Treatment == "Vehicle")
ps3.vehicle # Check that the number of svehicleles is correct following the subsetting
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 412 taxa and 14 samples ]
## sample_data() Sample Data: [ 14 samples by 26 sample variables ]
## tax_table() Taxonomy Table: [ 412 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 412 tips and 410 internal nodes ]
any(taxa_sums(ps3.vehicle) == 0) # In this case it is TRUE, so remove the zero's
## [1] TRUE
ps3.vehicle <- prune_taxa(taxa_sums(ps3.vehicle) > 0, ps3.vehicle)
ps3.vehicle # Number of taxa should be smaller now
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 341 taxa and 14 samples ]
## sample_data() Sample Data: [ 14 samples by 26 sample variables ]
## tax_table() Taxonomy Table: [ 341 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 341 tips and 339 internal nodes ]
any(taxa_sums(ps3.vehicle) == 0) # It should now be false
## [1] FALSE
Bar plots can be useful for examining overall taxon representation across your samples. Importantly, they do not portray any information about the overall abundance of each taxa (just relative abundance) and should not be usef for making anything more than high-level visual representations of the community composition of your data.
# Create a data table for ggploting
ps3_phylum <- ps3 %>%
tax_glom(taxrank = "Phylum") %>% # agglomerate at phylum level
transform_sample_counts(function(x) {x/sum(x)} ) %>% # Transform to rel. abundance (or use ps0.ra)
psmelt() %>% # Melt to long format for easy ggploting
filter(Abundance > 0.01) # Filter out low abundance taxa
# Plot - Phylum
p.ra.phylum <- ggplot(ps3_phylum, aes(x = Name, y = Abundance, fill = Phylum)) +
geom_bar(stat = "identity", width = 1) +
facet_grid(~Treatment, scales = "free_x") +
theme(axis.text.x = element_blank()) +
theme(axis.title.x = element_blank()) +
ggtitle("Abundant Phylum (> 1%)")
p.ra.phylum
# Note: This is a nice place to output tables of data that you may want to use for other analysis, or to include as supplemental data for publication
# You can rerun the first bit of code in this chunk and change Phylum to Species for a table with all possible classifications
write.table(ps3_phylum, file = "phylum_relab.txt", sep = "\t")
ggplotly(p.ra.phylum)
These plots show that Ampicillin treatment changes the ratio of Bacteroidetes to Firmicutes and results in the expansion of Tenericutes and Proteobacteria.
Alpha diversity is diversity of species on a local scale such as a specific environmental sampling site or patient sample. Species diversity consists of two componentsL species richness and species evenness. Species richness is simply the count of species at a site while species evenness takes into account there abundances. Equal abundances indicates high evenness and low diversity, unequal abundnaces indicates low evenness and high diversity.
The basic properties can be useful for determining high-level differences between study groups. Importantly, they do not take into account any taxonomic information, so any differences you see require further investigations into what taxa are driving differences in alpha diversity.
To test for differences in alpha diversity we will compare the mean diversity of different diversity measures between groups using a t-test and use box or violin plots for display. The R chunk below implements two measures: 1) Observed which is the observed number of species, or richness, 2) Shannon Diversity which is a standard diversity measure. Other measures are available in estimate_richness options.
# Diversity
diversity <- global(ps3)
head(diversity)
## richness_0 richness_20 richness_50 richness_80
## 806rcbc333 236 236 236 187
## 806rcbc334 246 246 246 174
## 806rcbc335 255 255 255 213
## 806rcbc336 222 222 222 148
## 806rcbc337 199 199 199 128
## 806rcbc338 234 234 234 178
## diversities_inverse_simpson diversities_gini_simpson
## 806rcbc333 11.139720 0.9102311
## 806rcbc334 7.368930 0.8642951
## 806rcbc335 11.768307 0.9150260
## 806rcbc336 10.220625 0.9021586
## 806rcbc337 5.157725 0.8061161
## 806rcbc338 8.060761 0.8759422
## diversities_shannon diversities_fisher diversities_coverage
## 806rcbc333 3.275649 32.59279 4
## 806rcbc334 2.906675 34.90328 3
## 806rcbc335 3.426973 35.42800 4
## 806rcbc336 2.963711 31.89358 4
## 806rcbc337 2.292101 27.78067 2
## 806rcbc338 3.036928 34.16407 3
## evenness_camargo evenness_pielou evenness_simpson evenness_evar
## 806rcbc333 0.06699647 0.5995149 0.02703816 0.1738477
## 806rcbc334 0.05223827 0.5279745 0.01788575 0.1742576
## 806rcbc335 0.07982035 0.6184461 0.02856385 0.1801690
## 806rcbc336 0.04564875 0.5485633 0.02480734 0.1689148
## 806rcbc337 0.02749613 0.4330189 0.01251875 0.1795631
## 806rcbc338 0.06114635 0.5566909 0.01956495 0.1846712
## evenness_bulla dominance_dbp dominance_dmn dominance_absolute
## 806rcbc333 0.1874398 0.1989041 0.3585072 9039
## 806rcbc334 0.1613208 0.2547290 0.4956511 10221
## 806rcbc335 0.2112214 0.2093259 0.3589380 9903
## 806rcbc336 0.1373706 0.1604751 0.3085372 5391
## 806rcbc337 0.1044051 0.2970703 0.5764509 10647
## 806rcbc338 0.1839894 0.2795278 0.4198198 8998
## dominance_relative dominance_simpson dominance_core_abundance
## 806rcbc333 0.1989041 0.08976886 0.6612974
## 806rcbc334 0.2547290 0.13570492 0.8227040
## 806rcbc335 0.2093259 0.08497399 0.6436196
## 806rcbc336 0.1604751 0.09784137 0.8071084
## 806rcbc337 0.2970703 0.19388393 0.9112165
## 806rcbc338 0.2795278 0.12405776 0.7901522
## dominance_gini rarity_log_modulo_skewness rarity_low_abundance
## 806rcbc333 0.9330035 2.059167 0.08064871
## 806rcbc334 0.9477617 2.060759 0.07170093
## 806rcbc335 0.9201797 2.058312 0.08201399
## 806rcbc336 0.9543512 2.059651 0.06435673
## 806rcbc337 0.9725039 2.060607 0.05094866
## 806rcbc338 0.9388537 2.061208 0.07555141
## rarity_noncore_abundance rarity_rare_abundance
## 806rcbc333 0.16512631 0.005127190
## 806rcbc334 0.08750156 0.002566978
## 806rcbc335 0.24337864 0.035067323
## 806rcbc336 0.13252366 0.014437102
## 806rcbc337 0.04319196 0.001004464
## 806rcbc338 0.13460702 0.004442373
sd.3 <- as.data.frame(sample_data(ps3)) # useful to coerce the phyloseq object into an R data frame
ps3.rich <- merge(sd.3, diversity, by ="row.names") # merge sd.1 by row names
# Add divergence measurements
ps3.rich$divergence <- divergence(ps3)
p.rich.treatment <- ggboxplot(ps3.rich, x = "Treatment", y = "richness_0", outlier.shape = NA) +
geom_jitter(width = 0.2) +
# facet_grid(~Week) +
ylab("Richness") +
theme(axis.title.x = element_blank()) +
stat_compare_means(method = "t.test")
p.sd.treatment <- ggboxplot(ps3.rich, x = "Treatment", y = "diversities_shannon", outlier.shape = NA) +
geom_jitter(width = 0.2) +
# facet_grid(~Week) +
ylab("Shannon diversity") +
theme(axis.title.x = element_blank()) +
stat_compare_means(method = "t.test")
grid.arrange(p.rich.treatment, p.sd.treatment, ncol = 2)
Analysis of alpha diversity along continuous variables can be visualized using visually weighted regression plots (http://www.fight-entropy.com/2012/07/visually-weighted-regression.html).
# The plot_regression command from the microbiome package requires phyloseq objects
# For the violin plots above we combined alpha diversity measurements with sample data as a data frame to generate ggplot ready data
# We will take the same approach here, but will combine alpha diversity measurements with sample data in the phyloseq object to prepare it for microbiome
# Diversity
diversity.vehicle <- global(ps3.vehicle)
diversity.amp <- global(ps3.amp)
head(diversity.amp)
## richness_0 richness_20 richness_50 richness_80
## 806rcbc349 64 64 64 36
## 806rcbc350 91 91 91 46
## 806rcbc351 76 76 76 45
## 806rcbc352 80 80 80 48
## 806rcbc353 123 123 123 77
## 806rcbc354 181 181 181 113
## diversities_inverse_simpson diversities_gini_simpson
## 806rcbc349 15.381904 0.9349885
## 806rcbc350 7.507538 0.8668005
## 806rcbc351 8.357182 0.8803424
## 806rcbc352 18.134298 0.9448559
## 806rcbc353 14.146269 0.9293100
## 806rcbc354 23.785381 0.9579574
## diversities_shannon diversities_fisher diversities_coverage
## 806rcbc349 3.320775 23.22971 7
## 806rcbc350 3.004054 28.33299 3
## 806rcbc351 3.080778 24.74539 4
## 806rcbc352 3.573072 30.11120 8
## 806rcbc353 3.415539 35.53045 5
## 806rcbc354 3.914249 49.81828 8
## evenness_camargo evenness_pielou evenness_simpson evenness_evar
## 806rcbc349 0.06429491 0.7984777 0.04284653 0.4510124
## 806rcbc350 0.06192097 0.6659605 0.02091236 0.4124210
## 806rcbc351 0.06107885 0.7113756 0.02327906 0.4385144
## 806rcbc352 0.08464755 0.8153925 0.05051336 0.4906040
## 806rcbc353 0.08254724 0.7097689 0.03940465 0.3828727
## 806rcbc354 0.12645936 0.7529578 0.06625454 0.3508831
## evenness_bulla dominance_dbp dominance_dmn dominance_absolute
## 806rcbc349 0.1759777 0.18421053 0.2748538 63
## 806rcbc350 0.1925512 0.31259259 0.4711111 211
## 806rcbc351 0.1839789 0.30844794 0.4066798 157
## 806rcbc352 0.2117094 0.16290727 0.2756892 65
## 806rcbc353 0.2248285 0.14949863 0.2816773 164
## 806rcbc354 0.2778637 0.09645777 0.1874659 177
## dominance_relative dominance_simpson dominance_core_abundance
## 806rcbc349 0.18421053 0.06501146 0.8654971
## 806rcbc350 0.31259259 0.13319945 0.8622222
## 806rcbc351 0.30844794 0.11965756 0.8644401
## 806rcbc352 0.16290727 0.05514413 0.8320802
## 806rcbc353 0.14949863 0.07069001 0.8559708
## 806rcbc354 0.09645777 0.04204263 0.6850136
## dominance_gini rarity_log_modulo_skewness rarity_low_abundance
## 806rcbc349 0.9357051 2.037462 0.00000000
## 806rcbc350 0.9380790 2.050175 0.06666667
## 806rcbc351 0.9389211 2.049303 0.06090373
## 806rcbc352 0.9153524 2.039103 0.00000000
## 806rcbc353 0.9174528 2.054327 0.09662716
## 806rcbc354 0.8735406 2.044635 0.10354223
## rarity_noncore_abundance rarity_rare_abundance
## 806rcbc349 0.06725146 0.01461988
## 806rcbc350 0.07259259 0.02518519
## 806rcbc351 0.07858546 0.02750491
## 806rcbc352 0.11278195 0.04511278
## 806rcbc353 0.07383774 0.02461258
## 806rcbc354 0.16893733 0.05340599
# Combine diversity into metadata
sample_data(ps3.vehicle)$Shannon <- diversity.vehicle$diversities_shannon
sample_data(ps3.vehicle)$Richness <- diversity.vehicle$richness_0
sample_data(ps3.amp)$Shannon <- diversity.amp$diversities_shannon
sample_data(ps3.amp)$Richness <- diversity.amp$richness_0
colnames(sample_data(ps3.vehicle))
## [1] "X.SampleID" "NumberReads" "BarcodeSequence"
## [4] "LinkerPrimerSequence" "Plate" "Well"
## [7] "Name" "SampleNo" "DNAPlateNumber"
## [10] "Cell" "Experiment" "Microbiota"
## [13] "Sex" "GroupedCage" "IndividualCage"
## [16] "DayPostTreatment" "Treatment" "Virus"
## [19] "DayPostInoculation" "SurvivalStatus" "DaysToDeath"
## [22] "SampleCollected" "SampesProcessed" "Description"
## [25] "continuous_var_1" "continuous_var_2" "Shannon"
## [28] "Richness"
colnames(sample_data(ps3.amp))
## [1] "X.SampleID" "NumberReads" "BarcodeSequence"
## [4] "LinkerPrimerSequence" "Plate" "Well"
## [7] "Name" "SampleNo" "DNAPlateNumber"
## [10] "Cell" "Experiment" "Microbiota"
## [13] "Sex" "GroupedCage" "IndividualCage"
## [16] "DayPostTreatment" "Treatment" "Virus"
## [19] "DayPostInoculation" "SurvivalStatus" "DaysToDeath"
## [22] "SampleCollected" "SampesProcessed" "Description"
## [25] "continuous_var_1" "continuous_var_2" "Shannon"
## [28] "Richness"
p.regr.veh <- plot_regression(Shannon ~ Richness, meta(ps3.vehicle), spag = FALSE, shade = FALSE, show.lm = TRUE) +
ggtitle("Vehicle Control Mice") +
ylab("Shannon Diversity") +
xlab("Richness")
p.regr.amp <- plot_regression(Shannon ~ Richness, meta(ps3.amp), spag = FALSE, shade = FALSE, show.lm = TRUE) +
ggtitle("Ampicillin Treated Mice") +
ylab("Shannon Diversity") +
xlab("Richness")
grid.arrange(p.regr.veh, p.regr.amp, nrow = 2)
Correlation between measures of alpha diversity and continuous variables can be assessed using the cor.test function of base R and the corrplot package for measuring multiple variables at once.
cor.vehicle <- cor(diversity.vehicle)
cor.amp <- cor(diversity.amp)
corrplot(cor.vehicle)
corrplot(cor.amp)
# If you wanted to compare with data with sample metadata you can merge the entire diversity data frame to your sample data
ps3.div.vehicle <- merge(sample_data(ps3.vehicle), diversity.vehicle, by = "row.names")
ps3.div.amp <- merge(sample_data(ps3.amp), diversity.amp, by = "row.names")
# Remove non-numeric variables
ps3.div.vehicle <- select_if(ps3.div.vehicle, is.numeric)
ps3.div.amp <- select_if(ps3.div.amp, is.numeric)
# Remove uninteresting variables, in this case the first 5 variables are not needed
ps3.div.vehicle <- subset(ps3.div.vehicle, select = -c(1:5))
ps3.div.amp <- subset(ps3.div.amp, select = -c(1:5))
# Run correlations
ps3.div.v.cor <- cor(ps3.div.vehicle)
ps3.div.a.cor <- cor(ps3.div.amp)
# Correlation plots
corrplot(ps3.div.v.cor)
corrplot(ps3.div.a.cor)
These results demonstrate how the average number of bacteria detected in ampicillin treated mice decreases as expected following antibiotic treatment. The diversity increases which is also reflected through the increase in Tenericutes and Proteobacteria in the community composition bar plots.
Beta diversity is the extent of difference in community composition between sites or along a gradient. There are a large number of ways to analyze and visualize beta diversity, but we are basically looking for a ordination to help us determine how different samples from different groups are. For example, we want to project the diversity of each sample as a point across an ordination plot that explains as as much variance in the data as possible and in an unconstrained manner. We then color each sample by a grouping variable (e.g. Treatment, Infection, Disease, Sex) and look for distinct grouping.
One can run statistical tests to determine if the null hypothesis that the groups are not dissimilar should be rejected at a given alpha.
This first R chunk will implement two ordinations using two commonly used phylogenetic diversity measures, unifrac and weighted unifrac. Weighted unifrac takes into account the relative abundnaces of each taxa, while unifrac does not. Both measures take into account the phylogenetic relatedness of each taxa. There are a large number of measures that can be interrogated.
Key factors that one may want to consdier when approaching a beta diversity analysis are:
# Ordinate
ord.pcoa.uni <- ordinate(ps3, method = "PCoA", distance = "unifrac")
ord.pcoa.wuni <- ordinate(ps3, method = "PCoA", distance = "wunifrac")
# Plots
p.pcoa.uni <- plot_ordination(ps3, ord.pcoa.uni, color = "GroupedCage") +
geom_point(size=5, alpha = 0.7) +
geom_point(colour = "grey50", size = 1.5) +
facet_grid(~ Treatment) +
stat_ellipse(type = "norm") +
ggtitle("PCoA | Unifrac")
p.pcoa.wuni <- plot_ordination(ps3, ord.pcoa.wuni, color = "SurvivalStatus") +
geom_point(size=5, alpha = 0.7) +
geom_point(colour = "grey50", size = 1.5) +
facet_grid(~ Treatment) +
stat_ellipse(type = "norm") +
ggtitle("PCoA | wUnifrac")
grid.arrange(p.pcoa.uni, p.pcoa.wuni, nrow = 2)
For fun the points are color coded by the cage each animal originate from or survival status . The take home message from these plots is that the samples are separated by treatment, and there isnt much of any separation by cage. This is to be expected from the experimental design. The ADONIS test can be used to test if the treatment groups are significantly different.
# Set a random seed so that exact results can be reproduced
set.seed(1000)
# Function to run adonis test on a physeq object and a variable from metadata
doadonis <- function(physeq, category) {
bdist <- phyloseq::distance(physeq, "bray")
col <- as(sample_data(physeq), "data.frame")[ ,category]
# Adonis test
adonis.bdist <- adonis(bdist ~ col)
print("Adonis results:")
print(adonis.bdist)
# Homogeneity of dispersion test
betatax = betadisper(bdist,col)
p = permutest(betatax)
print("Betadisper results:")
print(p$tab)
}
# Runs Permanov and Homogeneity of dispersion test
# See ?betadisper for more info
doadonis(ps0, "Treatment")
## [1] "Adonis results:"
##
## Call:
## adonis(formula = bdist ~ col)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## col 1 5.2961 5.2961 29.122 0.40378 0.001 ***
## Residuals 43 7.8201 0.1819 0.59622
## Total 44 13.1162 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Betadisper results:"
## Df Sum Sq Mean Sq F N.Perm Pr(>F)
## Groups 1 0.07064695 0.07064695 2.394681 999 0.132
## Residuals 43 1.26856946 0.02950162 NA NA NA
You can see from the highly significant p-value that treatment does indeed indicate that the groups are dissimilar. The Homogeneity of Dispersion is a test for how variable the sample groups are. For this data, the significant p-value suggets that the variance is significantly different between the groups as well.
The second doadonis command tests whether or not mice that survived or died as a result of treatment were dissimilar. This example was included as it demonstrates the utility of subsetting as we only wanted to run the test on ampicillin treated samples.
# Differential Abundance - Baseline ~ CD4 and VL Category
ds <- phyloseq_to_deseq2(ps0, ~Treatment) # Change formula to VL_category and rerun to test VL_category
dds <- DESeq(ds, test="Wald", fitType="local", betaPrior = FALSE) # Worth trying parametric and local fitTypes
alpha = 0.05
# Tabulate and write results
res.dds = results(dds, cooksCutoff = FALSE)
sigtab_dds = res.dds[which(res.dds$padj < alpha), ]
sigtab_dds = cbind(as(sigtab_dds, "data.frame"), as(tax_table(ps0)[rownames(sigtab_dds), ], "matrix"))
head(sigtab_dds)
## baseMean
## CGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGGATGCCAAGTCAGCGGTAAAAAAGCGGTGCTCAACGCCGTCGAGCCGTTGAAACTGGCGTTCTTGAGTGGGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACTCCGATTGCGAAGGCAGCATACCGGCGCCCTACTGACGCTGAGGCACGAAAGCGTGGGTATCGAACA 327.03171
## GGAGGATTCAAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCGGATAAGTTAGAGGTGAAATCCCGAGGCTCAACTTCGGAATTGCCTCTGATACTGTTCGGCTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTGAAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAGCTACTTCTGACGTTGAGGCACGAAAGCGTGGGGAGCAAAC 235.35751
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCAGACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA 100.85545
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCACACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA 38.55688
## CGTAGGGAGCGAGCGTTATCCGGAATTATTGGGCGTAAAGGGTGCGTAGATGGCATAGTAAGTCTTTTGTAAAAATGCTGGGCTCAACCCAGTAGGGCAAAAGATACTGCAAAGCTAGAGTATGACAGAGGCAAGTGGAACTACATGTGTAGCGGTAAAATGCGTAAATATATGTAAGAACACCAGTGGCGAAGGCGGCTTGCTGGGTCGATACTGACATTGAGGCACGAAAGCGTGGGGAGCAAACA 481.24391
## GTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGTGCAGGCGGTTCAATAAGTCTGATGTGAAAGCCTTCGGCTCAACCGGAGAATTGCATCAGAAACTGTTGAACTTGAGTGCAGAAGAGGAGAGTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGCAACTGACGCTGAGGCTCGAAAGCATGGGTAGCGAAC 101.87178
## log2FoldChange
## CGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGGATGCCAAGTCAGCGGTAAAAAAGCGGTGCTCAACGCCGTCGAGCCGTTGAAACTGGCGTTCTTGAGTGGGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACTCCGATTGCGAAGGCAGCATACCGGCGCCCTACTGACGCTGAGGCACGAAAGCGTGGGTATCGAACA -1.677454
## GGAGGATTCAAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCGGATAAGTTAGAGGTGAAATCCCGAGGCTCAACTTCGGAATTGCCTCTGATACTGTTCGGCTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTGAAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAGCTACTTCTGACGTTGAGGCACGAAAGCGTGGGGAGCAAAC -1.758224
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCAGACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA -1.042287
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCACACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA -1.760693
## CGTAGGGAGCGAGCGTTATCCGGAATTATTGGGCGTAAAGGGTGCGTAGATGGCATAGTAAGTCTTTTGTAAAAATGCTGGGCTCAACCCAGTAGGGCAAAAGATACTGCAAAGCTAGAGTATGACAGAGGCAAGTGGAACTACATGTGTAGCGGTAAAATGCGTAAATATATGTAAGAACACCAGTGGCGAAGGCGGCTTGCTGGGTCGATACTGACATTGAGGCACGAAAGCGTGGGGAGCAAACA 4.830774
## GTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGTGCAGGCGGTTCAATAAGTCTGATGTGAAAGCCTTCGGCTCAACCGGAGAATTGCATCAGAAACTGTTGAACTTGAGTGCAGAAGAGGAGAGTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGCAACTGACGCTGAGGCTCGAAAGCATGGGTAGCGAAC 1.668447
## lfcSE
## CGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGGATGCCAAGTCAGCGGTAAAAAAGCGGTGCTCAACGCCGTCGAGCCGTTGAAACTGGCGTTCTTGAGTGGGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACTCCGATTGCGAAGGCAGCATACCGGCGCCCTACTGACGCTGAGGCACGAAAGCGTGGGTATCGAACA 0.3647439
## GGAGGATTCAAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCGGATAAGTTAGAGGTGAAATCCCGAGGCTCAACTTCGGAATTGCCTCTGATACTGTTCGGCTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTGAAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAGCTACTTCTGACGTTGAGGCACGAAAGCGTGGGGAGCAAAC 0.3972025
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCAGACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA 0.3449343
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCACACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA 0.3789173
## CGTAGGGAGCGAGCGTTATCCGGAATTATTGGGCGTAAAGGGTGCGTAGATGGCATAGTAAGTCTTTTGTAAAAATGCTGGGCTCAACCCAGTAGGGCAAAAGATACTGCAAAGCTAGAGTATGACAGAGGCAAGTGGAACTACATGTGTAGCGGTAAAATGCGTAAATATATGTAAGAACACCAGTGGCGAAGGCGGCTTGCTGGGTCGATACTGACATTGAGGCACGAAAGCGTGGGGAGCAAACA 0.6194682
## GTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGTGCAGGCGGTTCAATAAGTCTGATGTGAAAGCCTTCGGCTCAACCGGAGAATTGCATCAGAAACTGTTGAACTTGAGTGCAGAAGAGGAGAGTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGCAACTGACGCTGAGGCTCGAAAGCATGGGTAGCGAAC 0.4579249
## stat
## CGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGGATGCCAAGTCAGCGGTAAAAAAGCGGTGCTCAACGCCGTCGAGCCGTTGAAACTGGCGTTCTTGAGTGGGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACTCCGATTGCGAAGGCAGCATACCGGCGCCCTACTGACGCTGAGGCACGAAAGCGTGGGTATCGAACA -4.598992
## GGAGGATTCAAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCGGATAAGTTAGAGGTGAAATCCCGAGGCTCAACTTCGGAATTGCCTCTGATACTGTTCGGCTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTGAAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAGCTACTTCTGACGTTGAGGCACGAAAGCGTGGGGAGCAAAC -4.426519
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCAGACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA -3.021696
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCACACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA -4.646642
## CGTAGGGAGCGAGCGTTATCCGGAATTATTGGGCGTAAAGGGTGCGTAGATGGCATAGTAAGTCTTTTGTAAAAATGCTGGGCTCAACCCAGTAGGGCAAAAGATACTGCAAAGCTAGAGTATGACAGAGGCAAGTGGAACTACATGTGTAGCGGTAAAATGCGTAAATATATGTAAGAACACCAGTGGCGAAGGCGGCTTGCTGGGTCGATACTGACATTGAGGCACGAAAGCGTGGGGAGCAAACA 7.798261
## GTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGTGCAGGCGGTTCAATAAGTCTGATGTGAAAGCCTTCGGCTCAACCGGAGAATTGCATCAGAAACTGTTGAACTTGAGTGCAGAAGAGGAGAGTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGCAACTGACGCTGAGGCTCGAAAGCATGGGTAGCGAAC 3.643494
## pvalue
## CGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGGATGCCAAGTCAGCGGTAAAAAAGCGGTGCTCAACGCCGTCGAGCCGTTGAAACTGGCGTTCTTGAGTGGGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACTCCGATTGCGAAGGCAGCATACCGGCGCCCTACTGACGCTGAGGCACGAAAGCGTGGGTATCGAACA 4.245399e-06
## GGAGGATTCAAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCGGATAAGTTAGAGGTGAAATCCCGAGGCTCAACTTCGGAATTGCCTCTGATACTGTTCGGCTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTGAAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAGCTACTTCTGACGTTGAGGCACGAAAGCGTGGGGAGCAAAC 9.576597e-06
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCAGACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA 2.513628e-03
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCACACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA 3.373819e-06
## CGTAGGGAGCGAGCGTTATCCGGAATTATTGGGCGTAAAGGGTGCGTAGATGGCATAGTAAGTCTTTTGTAAAAATGCTGGGCTCAACCCAGTAGGGCAAAAGATACTGCAAAGCTAGAGTATGACAGAGGCAAGTGGAACTACATGTGTAGCGGTAAAATGCGTAAATATATGTAAGAACACCAGTGGCGAAGGCGGCTTGCTGGGTCGATACTGACATTGAGGCACGAAAGCGTGGGGAGCAAACA 6.276629e-15
## GTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGTGCAGGCGGTTCAATAAGTCTGATGTGAAAGCCTTCGGCTCAACCGGAGAATTGCATCAGAAACTGTTGAACTTGAGTGCAGAAGAGGAGAGTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGCAACTGACGCTGAGGCTCGAAAGCATGGGTAGCGAAC 2.689617e-04
## padj
## CGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGGATGCCAAGTCAGCGGTAAAAAAGCGGTGCTCAACGCCGTCGAGCCGTTGAAACTGGCGTTCTTGAGTGGGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACTCCGATTGCGAAGGCAGCATACCGGCGCCCTACTGACGCTGAGGCACGAAAGCGTGGGTATCGAACA 2.843131e-05
## GGAGGATTCAAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCGGATAAGTTAGAGGTGAAATCCCGAGGCTCAACTTCGGAATTGCCTCTGATACTGTTCGGCTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTGAAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAGCTACTTCTGACGTTGAGGCACGAAAGCGTGGGGAGCAAAC 6.224788e-05
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCAGACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA 9.415452e-03
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCACACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA 2.330044e-05
## CGTAGGGAGCGAGCGTTATCCGGAATTATTGGGCGTAAAGGGTGCGTAGATGGCATAGTAAGTCTTTTGTAAAAATGCTGGGCTCAACCCAGTAGGGCAAAAGATACTGCAAAGCTAGAGTATGACAGAGGCAAGTGGAACTACATGTGTAGCGGTAAAATGCGTAAATATATGTAAGAACACCAGTGGCGAAGGCGGCTTGCTGGGTCGATACTGACATTGAGGCACGAAAGCGTGGGGAGCAAACA 7.300710e-14
## GTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGTGCAGGCGGTTCAATAAGTCTGATGTGAAAGCCTTCGGCTCAACCGGAGAATTGCATCAGAAACTGTTGAACTTGAGTGCAGAAGAGGAGAGTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGCAACTGACGCTGAGGCTCGAAAGCATGGGTAGCGAAC 1.350921e-03
## Kingdom
## CGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGGATGCCAAGTCAGCGGTAAAAAAGCGGTGCTCAACGCCGTCGAGCCGTTGAAACTGGCGTTCTTGAGTGGGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACTCCGATTGCGAAGGCAGCATACCGGCGCCCTACTGACGCTGAGGCACGAAAGCGTGGGTATCGAACA k__Bacteria
## GGAGGATTCAAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCGGATAAGTTAGAGGTGAAATCCCGAGGCTCAACTTCGGAATTGCCTCTGATACTGTTCGGCTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTGAAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAGCTACTTCTGACGTTGAGGCACGAAAGCGTGGGGAGCAAAC k__Bacteria
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCAGACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA k__Bacteria
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCACACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA k__Bacteria
## CGTAGGGAGCGAGCGTTATCCGGAATTATTGGGCGTAAAGGGTGCGTAGATGGCATAGTAAGTCTTTTGTAAAAATGCTGGGCTCAACCCAGTAGGGCAAAAGATACTGCAAAGCTAGAGTATGACAGAGGCAAGTGGAACTACATGTGTAGCGGTAAAATGCGTAAATATATGTAAGAACACCAGTGGCGAAGGCGGCTTGCTGGGTCGATACTGACATTGAGGCACGAAAGCGTGGGGAGCAAACA k__Bacteria
## GTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGTGCAGGCGGTTCAATAAGTCTGATGTGAAAGCCTTCGGCTCAACCGGAGAATTGCATCAGAAACTGTTGAACTTGAGTGCAGAAGAGGAGAGTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGCAACTGACGCTGAGGCTCGAAAGCATGGGTAGCGAAC k__Bacteria
## Phylum
## CGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGGATGCCAAGTCAGCGGTAAAAAAGCGGTGCTCAACGCCGTCGAGCCGTTGAAACTGGCGTTCTTGAGTGGGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACTCCGATTGCGAAGGCAGCATACCGGCGCCCTACTGACGCTGAGGCACGAAAGCGTGGGTATCGAACA p__Bacteroidetes
## GGAGGATTCAAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCGGATAAGTTAGAGGTGAAATCCCGAGGCTCAACTTCGGAATTGCCTCTGATACTGTTCGGCTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTGAAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAGCTACTTCTGACGTTGAGGCACGAAAGCGTGGGGAGCAAAC p__Bacteroidetes
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCAGACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA p__Bacteroidetes
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCACACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA p__Bacteroidetes
## CGTAGGGAGCGAGCGTTATCCGGAATTATTGGGCGTAAAGGGTGCGTAGATGGCATAGTAAGTCTTTTGTAAAAATGCTGGGCTCAACCCAGTAGGGCAAAAGATACTGCAAAGCTAGAGTATGACAGAGGCAAGTGGAACTACATGTGTAGCGGTAAAATGCGTAAATATATGTAAGAACACCAGTGGCGAAGGCGGCTTGCTGGGTCGATACTGACATTGAGGCACGAAAGCGTGGGGAGCAAACA p__Tenericutes
## GTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGTGCAGGCGGTTCAATAAGTCTGATGTGAAAGCCTTCGGCTCAACCGGAGAATTGCATCAGAAACTGTTGAACTTGAGTGCAGAAGAGGAGAGTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGCAACTGACGCTGAGGCTCGAAAGCATGGGTAGCGAAC p__Firmicutes
## Class
## CGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGGATGCCAAGTCAGCGGTAAAAAAGCGGTGCTCAACGCCGTCGAGCCGTTGAAACTGGCGTTCTTGAGTGGGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACTCCGATTGCGAAGGCAGCATACCGGCGCCCTACTGACGCTGAGGCACGAAAGCGTGGGTATCGAACA c__Bacteroidia
## GGAGGATTCAAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCGGATAAGTTAGAGGTGAAATCCCGAGGCTCAACTTCGGAATTGCCTCTGATACTGTTCGGCTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTGAAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAGCTACTTCTGACGTTGAGGCACGAAAGCGTGGGGAGCAAAC c__Bacteroidia
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCAGACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA c__Bacteroidia
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCACACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA c__Bacteroidia
## CGTAGGGAGCGAGCGTTATCCGGAATTATTGGGCGTAAAGGGTGCGTAGATGGCATAGTAAGTCTTTTGTAAAAATGCTGGGCTCAACCCAGTAGGGCAAAAGATACTGCAAAGCTAGAGTATGACAGAGGCAAGTGGAACTACATGTGTAGCGGTAAAATGCGTAAATATATGTAAGAACACCAGTGGCGAAGGCGGCTTGCTGGGTCGATACTGACATTGAGGCACGAAAGCGTGGGGAGCAAACA c__Mollicutes
## GTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGTGCAGGCGGTTCAATAAGTCTGATGTGAAAGCCTTCGGCTCAACCGGAGAATTGCATCAGAAACTGTTGAACTTGAGTGCAGAAGAGGAGAGTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGCAACTGACGCTGAGGCTCGAAAGCATGGGTAGCGAAC c__Bacilli
## Order
## CGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGGATGCCAAGTCAGCGGTAAAAAAGCGGTGCTCAACGCCGTCGAGCCGTTGAAACTGGCGTTCTTGAGTGGGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACTCCGATTGCGAAGGCAGCATACCGGCGCCCTACTGACGCTGAGGCACGAAAGCGTGGGTATCGAACA o__Bacteroidales
## GGAGGATTCAAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCGGATAAGTTAGAGGTGAAATCCCGAGGCTCAACTTCGGAATTGCCTCTGATACTGTTCGGCTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTGAAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAGCTACTTCTGACGTTGAGGCACGAAAGCGTGGGGAGCAAAC o__Bacteroidales
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCAGACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA o__Bacteroidales
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCACACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA o__Bacteroidales
## CGTAGGGAGCGAGCGTTATCCGGAATTATTGGGCGTAAAGGGTGCGTAGATGGCATAGTAAGTCTTTTGTAAAAATGCTGGGCTCAACCCAGTAGGGCAAAAGATACTGCAAAGCTAGAGTATGACAGAGGCAAGTGGAACTACATGTGTAGCGGTAAAATGCGTAAATATATGTAAGAACACCAGTGGCGAAGGCGGCTTGCTGGGTCGATACTGACATTGAGGCACGAAAGCGTGGGGAGCAAACA o__Anaeroplasmatales
## GTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGTGCAGGCGGTTCAATAAGTCTGATGTGAAAGCCTTCGGCTCAACCGGAGAATTGCATCAGAAACTGTTGAACTTGAGTGCAGAAGAGGAGAGTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGCAACTGACGCTGAGGCTCGAAAGCATGGGTAGCGAAC o__Lactobacillales
## Family
## CGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGGATGCCAAGTCAGCGGTAAAAAAGCGGTGCTCAACGCCGTCGAGCCGTTGAAACTGGCGTTCTTGAGTGGGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACTCCGATTGCGAAGGCAGCATACCGGCGCCCTACTGACGCTGAGGCACGAAAGCGTGGGTATCGAACA f__S24-7
## GGAGGATTCAAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCGGATAAGTTAGAGGTGAAATCCCGAGGCTCAACTTCGGAATTGCCTCTGATACTGTTCGGCTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTGAAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAGCTACTTCTGACGTTGAGGCACGAAAGCGTGGGGAGCAAAC f__Rikenellaceae
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCAGACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA f__S24-7
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCACACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA f__S24-7
## CGTAGGGAGCGAGCGTTATCCGGAATTATTGGGCGTAAAGGGTGCGTAGATGGCATAGTAAGTCTTTTGTAAAAATGCTGGGCTCAACCCAGTAGGGCAAAAGATACTGCAAAGCTAGAGTATGACAGAGGCAAGTGGAACTACATGTGTAGCGGTAAAATGCGTAAATATATGTAAGAACACCAGTGGCGAAGGCGGCTTGCTGGGTCGATACTGACATTGAGGCACGAAAGCGTGGGGAGCAAACA f__Anaeroplasmataceae
## GTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGTGCAGGCGGTTCAATAAGTCTGATGTGAAAGCCTTCGGCTCAACCGGAGAATTGCATCAGAAACTGTTGAACTTGAGTGCAGAAGAGGAGAGTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGCAACTGACGCTGAGGCTCGAAAGCATGGGTAGCGAAC f__Lactobacillaceae
## Genus
## CGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGGATGCCAAGTCAGCGGTAAAAAAGCGGTGCTCAACGCCGTCGAGCCGTTGAAACTGGCGTTCTTGAGTGGGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACTCCGATTGCGAAGGCAGCATACCGGCGCCCTACTGACGCTGAGGCACGAAAGCGTGGGTATCGAACA g__
## GGAGGATTCAAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCGGATAAGTTAGAGGTGAAATCCCGAGGCTCAACTTCGGAATTGCCTCTGATACTGTTCGGCTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTGAAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAGCTACTTCTGACGTTGAGGCACGAAAGCGTGGGGAGCAAAC g__
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCAGACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA g__
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCACACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA g__
## CGTAGGGAGCGAGCGTTATCCGGAATTATTGGGCGTAAAGGGTGCGTAGATGGCATAGTAAGTCTTTTGTAAAAATGCTGGGCTCAACCCAGTAGGGCAAAAGATACTGCAAAGCTAGAGTATGACAGAGGCAAGTGGAACTACATGTGTAGCGGTAAAATGCGTAAATATATGTAAGAACACCAGTGGCGAAGGCGGCTTGCTGGGTCGATACTGACATTGAGGCACGAAAGCGTGGGGAGCAAACA g__Anaeroplasma
## GTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGTGCAGGCGGTTCAATAAGTCTGATGTGAAAGCCTTCGGCTCAACCGGAGAATTGCATCAGAAACTGTTGAACTTGAGTGCAGAAGAGGAGAGTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGCAACTGACGCTGAGGCTCGAAAGCATGGGTAGCGAAC g__Lactobacillus
## Species
## CGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGGATGCCAAGTCAGCGGTAAAAAAGCGGTGCTCAACGCCGTCGAGCCGTTGAAACTGGCGTTCTTGAGTGGGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACTCCGATTGCGAAGGCAGCATACCGGCGCCCTACTGACGCTGAGGCACGAAAGCGTGGGTATCGAACA s__
## GGAGGATTCAAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCGGATAAGTTAGAGGTGAAATCCCGAGGCTCAACTTCGGAATTGCCTCTGATACTGTTCGGCTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTGAAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAGCTACTTCTGACGTTGAGGCACGAAAGCGTGGGGAGCAAAC s__
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCAGACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA s__
## GGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCCGTTAAGTCAGCGGTAAAATTGCGGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCACACTGACGCTGAGGCACGAAAGCGTGGGGATCGAACA s__
## CGTAGGGAGCGAGCGTTATCCGGAATTATTGGGCGTAAAGGGTGCGTAGATGGCATAGTAAGTCTTTTGTAAAAATGCTGGGCTCAACCCAGTAGGGCAAAAGATACTGCAAAGCTAGAGTATGACAGAGGCAAGTGGAACTACATGTGTAGCGGTAAAATGCGTAAATATATGTAAGAACACCAGTGGCGAAGGCGGCTTGCTGGGTCGATACTGACATTGAGGCACGAAAGCGTGGGGAGCAAACA s__
## GTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGTGCAGGCGGTTCAATAAGTCTGATGTGAAAGCCTTCGGCTCAACCGGAGAATTGCATCAGAAACTGTTGAACTTGAGTGCAGAAGAGGAGAGTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGCAACTGACGCTGAGGCTCGAAAGCATGGGTAGCGAAC s__
write.table(sigtab_dds, file="./results/deseq_results.txt", sep = "\t") # Change filename to save results to appropriate file
# Simple plot of results
x.ds = tapply(sigtab_dds$log2FoldChange, sigtab_dds$Family, function(x) max(x))
x.ds = sort(x.ds, TRUE)
sigtab_dds$Species = factor(as.character(sigtab_dds$Family), levels=names(x.ds))
p.deseq <- ggplot(sigtab_dds, aes(x=Family, y=log2FoldChange, color=Genus)) +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5)) +
geom_point(size=5, alpha = 0.7) +
theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5)) +
ggtitle("Taxa Differentially Associated by Treatment") +
geom_hline(yintercept = 0, lwd=1.5)
p.deseq
# Quick check of factor levels
mcols(res.dds, use.names = TRUE)
## DataFrame with 6 rows and 2 columns
## type
## <character>
## baseMean intermediate
## log2FoldChange results
## lfcSE results
## stat results
## pvalue results
## padj results
## description
## <character>
## baseMean mean of normalized counts for all samples
## log2FoldChange log2 fold change (MLE): Treatment Ampicillin vs Vehicle
## lfcSE standard error: Treatment Ampicillin vs Vehicle
## stat Wald statistic: Treatment Ampicillin vs Vehicle
## pvalue Wald test p-value: Treatment Ampicillin vs Vehicle
## padj BH adjusted p-values
Volcano plots are useful for visualizing not only the differentially abundnat taxa, but those taxa which did not pass the alpha threshold. Interactive plots using plotly can aid in interrogating the complex arrangements of points.
# Prepare to join results data.table and taxonomy table
resdt = data.table(as(results(dds, cooksCutoff = FALSE), "data.frame"),
keep.rownames = TRUE)
setnames(resdt, "rn", "OTU")
taxdt = data.table(data.frame(as(tax_table(ps0), "matrix")), keep.rownames = TRUE)
setnames(taxdt, "rn", "OTU")
# Join results data.table and taxonomy table
setkeyv(taxdt, "OTU")
setkeyv(resdt, "OTU")
resdt <- taxdt[resdt]
resdt
## OTU
## 1: ACGGGGGGGGCAAGTGTTCTTCGGAATGACTGGGCGTAAAGGGCACGTAGGCGGTGAATCGGGTTGAAAGTGAAAGTCGCCAAAAACTGGTGGAATGCTCTCGAAACCAATTCACTTGAGTGAGACAGAGGAGAGTGGAATTTCGTGTGTAGGGGTGAAATCCGCAGATCTACGAAGGAACGCCAAAAGCGAAGGCAGCTCTCTGGGTCCCTACCGACGCTGGAGTGCGAAAGCATGGGGAGCGAACGG
## 2: ACGGGGGGGGCAAGTGTTCTTCGGAATGACTGGGCGTAAAGGGCACGTAGGCGGTGAATCGGGTTGAAAGTGAAAGTCGCCAAAAAGTGGCGGAATGCTCTCGAAACCAATTCACTTGAGTGAGACAGAGGAGAGTGGAATTTCGTGTGTAGGGGTGAAATCCGTAGATCTACGAAGGAACGCCAAAAGCGAAGGCAGCTCTCTGGGTCCCTACCGACGCTGGGGTGCGAAAGCATGGGGAGCGAACAG
## 3: AGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCTGTAGGTGGCTTTTCAAGTCCGCCGTCAAATCCCAGGGCTCAACCCTGGACAGGCGGTGGAAACTACCAAGCTGGAGTACGGTAGGGGCAGAGGGAATTTCCGGTGGAGCGGTGAAATGCATTGAGATCGGAAAGAACACCAACGGCGAAAGCACTCTGCTGGGCCGACACTGACACTGAGAGACGAAAGCTAGGGGAGCAAAT
## 4: AGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCTGTAGGTGGCTTTTTAAGTTCGCCGTCAAATCCCAGGGCTCAACCCTGGACAGGCGGTGGAAACTACCAAGCTGGAGTACGGTAGGGGCAGAGGGAATTTCCGGTGGAGCGGTGAAATGCGTAGAGATCGGAAAGAACACCAACGGCGAAAGCACTCTGCTGGGCCGACACTGACACTGAGAGACGAAAGCTAGGGGAGCGAAT
## 5: AGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCTGTAGGTGGCTTTTTAAGTTCGCCGTCAAATCCCAGGGCTCAACCCTGGACAGGTGGTGAAAACTACTAAGCTAGAGTACGGTAGGGGCAGAGGGAATTTCCGGTGGAGCGATGAAATGCGTAGAGATCGGAAGGAACACCAACGGCGAAAGCACTCTGCTGGGCCGACACTGACACTGAGAGACGAAAGCTAGGGGAGCGAAT
## ---
## 439: TAGGGGGCAAGCGTTATCCGGAATTACTGGGTGTAAAGGGAGCGTAGGCGGCATGGTAAGCCAGATGTGAAAGCCCGCGGCTTAACCGCGCGGATTGCATTTGGAACTATCAAGCTAGAGTACAGGAGAGGAAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTTCTGGACTGAAACTGACGCTGAGGCTCGAAAGCGTGGGGAGCAAA
## 440: TAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGGTGGCATGGCAAGCCAGAAGTGAAAACCCGGGGCTTAACCCCGCGGATTGCTTTTGGAACTGTCAGGCTGGAGTGCAGGAGGGGCAGGCGGAATTCCTGGTGTAGCGGTGAAATGCGTAGATATCAGGAGGAACACCGGTGGCGAAGGCGGCCTGCTGGACTGTAACTGACACTGAGGCTCGAAAGCGTGGGGAGCAAA
## 441: TAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGGTGGCATGGTAAGCCAGAAGTGAAAACCCAGGGCTCAACTCTGTGGATTGCTTTTGGAACTATCAAGCTAGAGTGCTGGAGGGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCGGTGGCGAAGGCGGCTTACTGGACAGAAACTGACACTGAGGCTCGAAAGCGTGGGGAGCAAA
## 442: TAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGGTGGCATGGTAAGCCAGAAGTGAAAACCCGGGGCTCAACCCCGCGGATTGCTTTTGGAACTATCAGGCTGGAGTGCTGGAGGGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCGGTGGCGAAGGCGGCTTACTGGACAGAAACTGACACTGAGGCTCGAAAGCGTGGGGAGCAAA
## 443: TAGGGGGCAAGCGTTGTTCGGAATGACTGGGCGTAAAGGGTGTGTAGGTGGTTTGATAAGTTAGATGTGTAATACCCAGGGCTTAACTCGGGTGCTGCATCTAAAACTGTTAGACTTGAGTACTGGAGAGGATAGTGGAATTCCTAGTGTAGCGGTGGAATGCGTAGATATTAGGAGGAACATCAGTGGCGAAGGCGACTATCTGGACAGCAACTGACACTGAGGCACGAAAGCGTGGGGAGCAAA
## Kingdom Phylum Class Order
## 1: k__Bacteria p__Proteobacteria c__Alphaproteobacteria o__Rickettsiales
## 2: k__Bacteria p__Proteobacteria c__Alphaproteobacteria o__Rickettsiales
## 3: k__Bacteria p__Cyanobacteria c__Chloroplast o__Streptophyta
## 4: k__Bacteria p__Cyanobacteria c__Chloroplast o__Streptophyta
## 5: k__Bacteria p__Cyanobacteria c__Chloroplast o__Streptophyta
## ---
## 439: k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales
## 440: k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales
## 441: k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales
## 442: k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales
## 443: k__Bacteria p__Proteobacteria NA NA
## Family Genus Species baseMean
## 1: f__mitochondria g__Lupinus s__luteus 8.1539852
## 2: f__mitochondria g__Zea s__luxurians 74.2677521
## 3: f__ g__ s__ 20.9865616
## 4: f__ g__ s__ 105.1752823
## 5: f__ g__ s__ 2.2078458
## ---
## 439: NA NA NA 0.1974782
## 440: f__ g__ s__ 1.2531111
## 441: f__Lachnospiraceae g__Coprococcus s__ 0.4991015
## 442: f__Lachnospiraceae g__Coprococcus s__ 0.1049244
## 443: NA NA NA 0.7233974
## log2FoldChange lfcSE stat pvalue padj
## 1: 9.3321835 0.8224754 11.3464590 7.722724e-30 4.876349e-28
## 2: 5.7485723 0.9156034 6.2784526 3.419592e-10 3.285781e-09
## 3: 3.8363204 0.9407839 4.0777914 4.546553e-05 2.543768e-04
## 4: 3.6737619 1.0525083 3.4904826 4.821490e-04 2.254025e-03
## 5: 7.5856134 0.9096882 8.3386964 7.511401e-17 9.764822e-16
## ---
## 439: 0.9060222 0.8051730 1.1252516 2.604824e-01 3.725995e-01
## 440: 2.0331237 0.6033231 3.3698752 7.520224e-04 3.426741e-03
## 441: 0.2409729 0.6382957 0.3775254 7.057832e-01 7.645984e-01
## 442: 2.2740985 1.0726519 2.1200713 3.400003e-02 8.079578e-02
## 443: 1.5507372 1.2630825 1.2277403 2.195445e-01 3.291055e-01
resdt[, Significant := padj < alpha]
resdt[!is.na(Significant)]
## OTU
## 1: ACGGGGGGGGCAAGTGTTCTTCGGAATGACTGGGCGTAAAGGGCACGTAGGCGGTGAATCGGGTTGAAAGTGAAAGTCGCCAAAAACTGGTGGAATGCTCTCGAAACCAATTCACTTGAGTGAGACAGAGGAGAGTGGAATTTCGTGTGTAGGGGTGAAATCCGCAGATCTACGAAGGAACGCCAAAAGCGAAGGCAGCTCTCTGGGTCCCTACCGACGCTGGAGTGCGAAAGCATGGGGAGCGAACGG
## 2: ACGGGGGGGGCAAGTGTTCTTCGGAATGACTGGGCGTAAAGGGCACGTAGGCGGTGAATCGGGTTGAAAGTGAAAGTCGCCAAAAAGTGGCGGAATGCTCTCGAAACCAATTCACTTGAGTGAGACAGAGGAGAGTGGAATTTCGTGTGTAGGGGTGAAATCCGTAGATCTACGAAGGAACGCCAAAAGCGAAGGCAGCTCTCTGGGTCCCTACCGACGCTGGGGTGCGAAAGCATGGGGAGCGAACAG
## 3: AGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCTGTAGGTGGCTTTTCAAGTCCGCCGTCAAATCCCAGGGCTCAACCCTGGACAGGCGGTGGAAACTACCAAGCTGGAGTACGGTAGGGGCAGAGGGAATTTCCGGTGGAGCGGTGAAATGCATTGAGATCGGAAAGAACACCAACGGCGAAAGCACTCTGCTGGGCCGACACTGACACTGAGAGACGAAAGCTAGGGGAGCAAAT
## 4: AGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCTGTAGGTGGCTTTTTAAGTTCGCCGTCAAATCCCAGGGCTCAACCCTGGACAGGCGGTGGAAACTACCAAGCTGGAGTACGGTAGGGGCAGAGGGAATTTCCGGTGGAGCGGTGAAATGCGTAGAGATCGGAAAGAACACCAACGGCGAAAGCACTCTGCTGGGCCGACACTGACACTGAGAGACGAAAGCTAGGGGAGCGAAT
## 5: AGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCTGTAGGTGGCTTTTTAAGTTCGCCGTCAAATCCCAGGGCTCAACCCTGGACAGGTGGTGAAAACTACTAAGCTAGAGTACGGTAGGGGCAGAGGGAATTTCCGGTGGAGCGATGAAATGCGTAGAGATCGGAAGGAACACCAACGGCGAAAGCACTCTGCTGGGCCGACACTGACACTGAGAGACGAAAGCTAGGGGAGCGAAT
## ---
## 438: TAGGGGGCAAGCGTTATCCGGAATTACTGGGTGTAAAGGGAGCGTAGGCGGCATGGTAAGCCAGATGTGAAAGCCCGCGGCTTAACCGCGCGGATTGCATTTGGAACTATCAAGCTAGAGTACAGGAGAGGAAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTTCTGGACTGAAACTGACGCTGAGGCTCGAAAGCGTGGGGAGCAAA
## 439: TAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGGTGGCATGGCAAGCCAGAAGTGAAAACCCGGGGCTTAACCCCGCGGATTGCTTTTGGAACTGTCAGGCTGGAGTGCAGGAGGGGCAGGCGGAATTCCTGGTGTAGCGGTGAAATGCGTAGATATCAGGAGGAACACCGGTGGCGAAGGCGGCCTGCTGGACTGTAACTGACACTGAGGCTCGAAAGCGTGGGGAGCAAA
## 440: TAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGGTGGCATGGTAAGCCAGAAGTGAAAACCCAGGGCTCAACTCTGTGGATTGCTTTTGGAACTATCAAGCTAGAGTGCTGGAGGGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCGGTGGCGAAGGCGGCTTACTGGACAGAAACTGACACTGAGGCTCGAAAGCGTGGGGAGCAAA
## 441: TAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGGTGGCATGGTAAGCCAGAAGTGAAAACCCGGGGCTCAACCCCGCGGATTGCTTTTGGAACTATCAGGCTGGAGTGCTGGAGGGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCGGTGGCGAAGGCGGCTTACTGGACAGAAACTGACACTGAGGCTCGAAAGCGTGGGGAGCAAA
## 442: TAGGGGGCAAGCGTTGTTCGGAATGACTGGGCGTAAAGGGTGTGTAGGTGGTTTGATAAGTTAGATGTGTAATACCCAGGGCTTAACTCGGGTGCTGCATCTAAAACTGTTAGACTTGAGTACTGGAGAGGATAGTGGAATTCCTAGTGTAGCGGTGGAATGCGTAGATATTAGGAGGAACATCAGTGGCGAAGGCGACTATCTGGACAGCAACTGACACTGAGGCACGAAAGCGTGGGGAGCAAA
## Kingdom Phylum Class Order
## 1: k__Bacteria p__Proteobacteria c__Alphaproteobacteria o__Rickettsiales
## 2: k__Bacteria p__Proteobacteria c__Alphaproteobacteria o__Rickettsiales
## 3: k__Bacteria p__Cyanobacteria c__Chloroplast o__Streptophyta
## 4: k__Bacteria p__Cyanobacteria c__Chloroplast o__Streptophyta
## 5: k__Bacteria p__Cyanobacteria c__Chloroplast o__Streptophyta
## ---
## 438: k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales
## 439: k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales
## 440: k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales
## 441: k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales
## 442: k__Bacteria p__Proteobacteria NA NA
## Family Genus Species baseMean
## 1: f__mitochondria g__Lupinus s__luteus 8.1539852
## 2: f__mitochondria g__Zea s__luxurians 74.2677521
## 3: f__ g__ s__ 20.9865616
## 4: f__ g__ s__ 105.1752823
## 5: f__ g__ s__ 2.2078458
## ---
## 438: NA NA NA 0.1974782
## 439: f__ g__ s__ 1.2531111
## 440: f__Lachnospiraceae g__Coprococcus s__ 0.4991015
## 441: f__Lachnospiraceae g__Coprococcus s__ 0.1049244
## 442: NA NA NA 0.7233974
## log2FoldChange lfcSE stat pvalue padj
## 1: 9.3321835 0.8224754 11.3464590 7.722724e-30 4.876349e-28
## 2: 5.7485723 0.9156034 6.2784526 3.419592e-10 3.285781e-09
## 3: 3.8363204 0.9407839 4.0777914 4.546553e-05 2.543768e-04
## 4: 3.6737619 1.0525083 3.4904826 4.821490e-04 2.254025e-03
## 5: 7.5856134 0.9096882 8.3386964 7.511401e-17 9.764822e-16
## ---
## 438: 0.9060222 0.8051730 1.1252516 2.604824e-01 3.725995e-01
## 439: 2.0331237 0.6033231 3.3698752 7.520224e-04 3.426741e-03
## 440: 0.2409729 0.6382957 0.3775254 7.057832e-01 7.645984e-01
## 441: 2.2740985 1.0726519 2.1200713 3.400003e-02 8.079578e-02
## 442: 1.5507372 1.2630825 1.2277403 2.195445e-01 3.291055e-01
## Significant
## 1: TRUE
## 2: TRUE
## 3: TRUE
## 4: TRUE
## 5: TRUE
## ---
## 438: FALSE
## 439: TRUE
## 440: FALSE
## 441: FALSE
## 442: FALSE
resdt
## OTU
## 1: ACGGGGGGGGCAAGTGTTCTTCGGAATGACTGGGCGTAAAGGGCACGTAGGCGGTGAATCGGGTTGAAAGTGAAAGTCGCCAAAAACTGGTGGAATGCTCTCGAAACCAATTCACTTGAGTGAGACAGAGGAGAGTGGAATTTCGTGTGTAGGGGTGAAATCCGCAGATCTACGAAGGAACGCCAAAAGCGAAGGCAGCTCTCTGGGTCCCTACCGACGCTGGAGTGCGAAAGCATGGGGAGCGAACGG
## 2: ACGGGGGGGGCAAGTGTTCTTCGGAATGACTGGGCGTAAAGGGCACGTAGGCGGTGAATCGGGTTGAAAGTGAAAGTCGCCAAAAAGTGGCGGAATGCTCTCGAAACCAATTCACTTGAGTGAGACAGAGGAGAGTGGAATTTCGTGTGTAGGGGTGAAATCCGTAGATCTACGAAGGAACGCCAAAAGCGAAGGCAGCTCTCTGGGTCCCTACCGACGCTGGGGTGCGAAAGCATGGGGAGCGAACAG
## 3: AGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCTGTAGGTGGCTTTTCAAGTCCGCCGTCAAATCCCAGGGCTCAACCCTGGACAGGCGGTGGAAACTACCAAGCTGGAGTACGGTAGGGGCAGAGGGAATTTCCGGTGGAGCGGTGAAATGCATTGAGATCGGAAAGAACACCAACGGCGAAAGCACTCTGCTGGGCCGACACTGACACTGAGAGACGAAAGCTAGGGGAGCAAAT
## 4: AGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCTGTAGGTGGCTTTTTAAGTTCGCCGTCAAATCCCAGGGCTCAACCCTGGACAGGCGGTGGAAACTACCAAGCTGGAGTACGGTAGGGGCAGAGGGAATTTCCGGTGGAGCGGTGAAATGCGTAGAGATCGGAAAGAACACCAACGGCGAAAGCACTCTGCTGGGCCGACACTGACACTGAGAGACGAAAGCTAGGGGAGCGAAT
## 5: AGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCTGTAGGTGGCTTTTTAAGTTCGCCGTCAAATCCCAGGGCTCAACCCTGGACAGGTGGTGAAAACTACTAAGCTAGAGTACGGTAGGGGCAGAGGGAATTTCCGGTGGAGCGATGAAATGCGTAGAGATCGGAAGGAACACCAACGGCGAAAGCACTCTGCTGGGCCGACACTGACACTGAGAGACGAAAGCTAGGGGAGCGAAT
## ---
## 439: TAGGGGGCAAGCGTTATCCGGAATTACTGGGTGTAAAGGGAGCGTAGGCGGCATGGTAAGCCAGATGTGAAAGCCCGCGGCTTAACCGCGCGGATTGCATTTGGAACTATCAAGCTAGAGTACAGGAGAGGAAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGCTTTCTGGACTGAAACTGACGCTGAGGCTCGAAAGCGTGGGGAGCAAA
## 440: TAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGGTGGCATGGCAAGCCAGAAGTGAAAACCCGGGGCTTAACCCCGCGGATTGCTTTTGGAACTGTCAGGCTGGAGTGCAGGAGGGGCAGGCGGAATTCCTGGTGTAGCGGTGAAATGCGTAGATATCAGGAGGAACACCGGTGGCGAAGGCGGCCTGCTGGACTGTAACTGACACTGAGGCTCGAAAGCGTGGGGAGCAAA
## 441: TAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGGTGGCATGGTAAGCCAGAAGTGAAAACCCAGGGCTCAACTCTGTGGATTGCTTTTGGAACTATCAAGCTAGAGTGCTGGAGGGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCGGTGGCGAAGGCGGCTTACTGGACAGAAACTGACACTGAGGCTCGAAAGCGTGGGGAGCAAA
## 442: TAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGGTGGCATGGTAAGCCAGAAGTGAAAACCCGGGGCTCAACCCCGCGGATTGCTTTTGGAACTATCAGGCTGGAGTGCTGGAGGGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCGGTGGCGAAGGCGGCTTACTGGACAGAAACTGACACTGAGGCTCGAAAGCGTGGGGAGCAAA
## 443: TAGGGGGCAAGCGTTGTTCGGAATGACTGGGCGTAAAGGGTGTGTAGGTGGTTTGATAAGTTAGATGTGTAATACCCAGGGCTTAACTCGGGTGCTGCATCTAAAACTGTTAGACTTGAGTACTGGAGAGGATAGTGGAATTCCTAGTGTAGCGGTGGAATGCGTAGATATTAGGAGGAACATCAGTGGCGAAGGCGACTATCTGGACAGCAACTGACACTGAGGCACGAAAGCGTGGGGAGCAAA
## Kingdom Phylum Class Order
## 1: k__Bacteria p__Proteobacteria c__Alphaproteobacteria o__Rickettsiales
## 2: k__Bacteria p__Proteobacteria c__Alphaproteobacteria o__Rickettsiales
## 3: k__Bacteria p__Cyanobacteria c__Chloroplast o__Streptophyta
## 4: k__Bacteria p__Cyanobacteria c__Chloroplast o__Streptophyta
## 5: k__Bacteria p__Cyanobacteria c__Chloroplast o__Streptophyta
## ---
## 439: k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales
## 440: k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales
## 441: k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales
## 442: k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales
## 443: k__Bacteria p__Proteobacteria NA NA
## Family Genus Species baseMean
## 1: f__mitochondria g__Lupinus s__luteus 8.1539852
## 2: f__mitochondria g__Zea s__luxurians 74.2677521
## 3: f__ g__ s__ 20.9865616
## 4: f__ g__ s__ 105.1752823
## 5: f__ g__ s__ 2.2078458
## ---
## 439: NA NA NA 0.1974782
## 440: f__ g__ s__ 1.2531111
## 441: f__Lachnospiraceae g__Coprococcus s__ 0.4991015
## 442: f__Lachnospiraceae g__Coprococcus s__ 0.1049244
## 443: NA NA NA 0.7233974
## log2FoldChange lfcSE stat pvalue padj
## 1: 9.3321835 0.8224754 11.3464590 7.722724e-30 4.876349e-28
## 2: 5.7485723 0.9156034 6.2784526 3.419592e-10 3.285781e-09
## 3: 3.8363204 0.9407839 4.0777914 4.546553e-05 2.543768e-04
## 4: 3.6737619 1.0525083 3.4904826 4.821490e-04 2.254025e-03
## 5: 7.5856134 0.9096882 8.3386964 7.511401e-17 9.764822e-16
## ---
## 439: 0.9060222 0.8051730 1.1252516 2.604824e-01 3.725995e-01
## 440: 2.0331237 0.6033231 3.3698752 7.520224e-04 3.426741e-03
## 441: 0.2409729 0.6382957 0.3775254 7.057832e-01 7.645984e-01
## 442: 2.2740985 1.0726519 2.1200713 3.400003e-02 8.079578e-02
## 443: 1.5507372 1.2630825 1.2277403 2.195445e-01 3.291055e-01
## Significant
## 1: TRUE
## 2: TRUE
## 3: TRUE
## 4: TRUE
## 5: TRUE
## ---
## 439: FALSE
## 440: TRUE
## 441: FALSE
## 442: FALSE
## 443: FALSE
volcano = ggplot(
data = resdt[!is.na(Significant)][(pvalue < 1)],
mapping = aes(x = log2FoldChange,
y = -log10(pvalue),
color = Phylum,
label = OTU, label1 = Genus)) +
theme_bw() +
geom_point() +
geom_point(data = resdt[(Significant)], size = 7, alpha = 0.7) +
# geom_text(data = resdt[(Significant)], mapping = aes(label = paste("Genus:", Genus)), color = "black", size = 3) +
theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5)) +
geom_hline(yintercept = -log10(alpha)) +
ggtitle("DESeq2 Negative Binomial Test Volcano Plot\nBaseline Samples") +
theme(axis.title = element_text(size=12)) +
theme(axis.text = element_text(size=12)) +
theme(legend.text = element_text(size=12)) +
geom_vline(xintercept = 0, lty = 2)
volcano
ggplotly(volcano)
##Ground truth plots
replace_counts = function(physeq, dds) {
dds_counts = counts(dds, normalized = TRUE)
if (!identical(taxa_names(physeq), rownames(dds_counts))) {
stop("OTU ids don't match")
}
otu_table(physeq) = otu_table(dds_counts, taxa_are_rows = TRUE)
return(physeq)
}
## Transform
# Convert raw counts to rlog normalized counts
ps3.rlog <- replace_counts(ps1, dds)
# o__Actinomycetales
ps3_Actinomycetales <- ps3.rlog %>%
subset_taxa(Order == "o__Actinomycetales") %>%
psmelt()
p.Actinomycetales <- ggboxplot(ps3_Actinomycetales, x = "Treatment", y = "Abundance", outlier.shape = NA, add = "median_iqr") +
geom_jitter(mapping = aes(size = 2), alpha = 0.4, na.rm = TRUE, width = 0.3) +
scale_y_log10() +
ylab("Abundance (rlog)") +
theme(axis.title.x = element_blank()) +
theme(legend.position = "NULL") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
facet_grid(~Genus) +
stat_compare_means(method = "wilcox.test", label = "p.format")
p.Actinomycetales
# Subset deseq2 ID'd sequence
ps3_Actinomycetales.seq <- subset(ps3_Actinomycetales, OTU == "GTAGGGTGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTGCTGTGAAATCCCGAGGCTCAACCTCGGGCTTGCAGTGGGTACGGGCAGACTAGAGTGCGGTAGGGGAGATTGGAATTCCTGGTGTAGCGGTGGAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGATCTCTGGGCCGTAACTGACGCTGAGGAGCGAAAGCATGGGGAGCGAAC")
write.table(ps3_Actinomycetales.seq, file = "./results/ps1_Actinomycetales_seq.txt", sep = "\t")
p.Actinomycetales.seq <- ggboxplot(ps3_Actinomycetales.seq, x = "Treatment", y = "Abundance", outlier.shape = NA, add = "median_iqr") +
geom_jitter(mapping = aes(size = 2), alpha = 0.4, na.rm = TRUE, width = 0.3) +
scale_y_log10() +
ylab("Abundance (rlog)") +
theme(axis.title.x = element_blank()) +
theme(legend.position = "NULL") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
stat_compare_means(method = "wilcox.test", label = "p.format")
p.Actinomycetales.seq
# Dsiplay current R session information
sessionInfo()
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] bindrcpp_0.2 corrplot_0.77
## [3] data.table_1.10.4 randomForest_4.6-12
## [5] ggpubr_0.1.5 magrittr_1.5
## [7] microbiome_0.99.42 plotly_4.7.1
## [9] DESeq2_1.16.1 SummarizedExperiment_1.6.3
## [11] DelayedArray_0.2.7 matrixStats_0.52.2
## [13] Biobase_2.36.2 GenomicRanges_1.28.4
## [15] GenomeInfoDb_1.12.2 IRanges_2.10.2
## [17] S4Vectors_0.14.3 BiocGenerics_0.22.0
## [19] knitr_1.17 gridExtra_2.2.1
## [21] vegan_2.4-3 lattice_0.20-35
## [23] permute_0.9-4 RColorBrewer_1.1-2
## [25] phyloseq_1.20.0 dplyr_0.7.2
## [27] purrr_0.2.3 readr_1.1.1
## [29] tidyr_0.7.0 tibble_1.3.4
## [31] ggplot2_2.2.1.9000 tidyverse_1.1.1
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.3-2 rprojroot_1.2
## [3] htmlTable_1.9 XVector_0.16.0
## [5] base64enc_0.1-3 bit64_0.9-7
## [7] AnnotationDbi_1.38.2 lubridate_1.6.0
## [9] xml2_1.1.1 codetools_0.2-15
## [11] splines_3.4.1 mnormt_1.5-5
## [13] geneplotter_1.54.0 ade4_1.7-8
## [15] Formula_1.2-2 jsonlite_1.5
## [17] broom_0.4.2 annotate_1.54.0
## [19] cluster_2.0.6 shiny_1.0.4
## [21] compiler_3.4.1 httr_1.3.1
## [23] backports_1.1.0 assertthat_0.2.0
## [25] Matrix_1.2-11 lazyeval_0.2.0
## [27] acepack_1.4.1 htmltools_0.3.6
## [29] tools_3.4.1 igraph_1.1.2
## [31] gtable_0.2.0 glue_1.1.1
## [33] GenomeInfoDbData_0.99.0 reshape2_1.4.2
## [35] Rcpp_0.12.12 cellranger_1.1.0
## [37] Biostrings_2.44.2 multtest_2.32.0
## [39] ape_4.1 nlme_3.1-131
## [41] crosstalk_1.0.0 iterators_1.0.8
## [43] psych_1.7.5 stringr_1.2.0
## [45] rvest_0.3.2 mime_0.5
## [47] XML_3.98-1.9 zlibbioc_1.22.0
## [49] MASS_7.3-47 scales_0.4.1.9002
## [51] hms_0.3 biomformat_1.4.0
## [53] rhdf5_2.20.0 yaml_2.1.14
## [55] memoise_1.1.0 rpart_4.1-11
## [57] latticeExtra_0.6-28 stringi_1.1.5
## [59] RSQLite_2.0 genefilter_1.58.1
## [61] foreach_1.4.3 checkmate_1.8.3
## [63] BiocParallel_1.10.1 rlang_0.1.2
## [65] pkgconfig_2.0.1 bitops_1.0-6
## [67] evaluate_0.10.1 bindr_0.1
## [69] labeling_0.3 htmlwidgets_0.9
## [71] tidyselect_0.1.1 bit_1.1-12
## [73] plyr_1.8.4 R6_2.2.2
## [75] Hmisc_4.0-3 DBI_0.7
## [77] withr_2.0.0 haven_1.1.0
## [79] foreign_0.8-69 mgcv_1.8-18
## [81] survival_2.41-3 RCurl_1.95-4.8
## [83] nnet_7.3-12 modelr_0.1.1
## [85] rmarkdown_1.6 locfit_1.5-9.1
## [87] grid_3.4.1 readxl_1.0.0
## [89] blob_1.1.0 forcats_0.2.0
## [91] digest_0.6.12 xtable_1.8-2
## [93] httpuv_1.3.5 munsell_0.4.3
## [95] viridisLite_0.2.0